 Okay, hello and welcome to our webinar this afternoon on language spoken in the UK. I'm going to have presentations from Alita Nandi from the University of Essex from myself and we've got a video we're going to try and play from Jemima Stockton. We haven't tried playing a video before in one of these webinars. If it doesn't work then we've got some slides that we can switch to. I should say that this webinar is one in a series of webinars that have been run from representatives of a wide number of different data resources funded by the ESRC and all of the webinars in this series have been recorded and are available on the UK Data Service YouTube channel. Recently we've had webinars on data about mental health, one last week on data about religion and there's one next week on data about political behaviour. So now I'd like to hand over to Alita Nandi who's going to talk about data in understanding society. So as Alita said, I'm Alita Nandi. Thank you for attending this webinar. I'm going to talk to you about the language questions that are there in understanding society. So before I tell you about the questions that are there, I'll tell you a little bit about the study itself. So what is understanding society? It is a survey of UK households. These households have been randomly selected from across the UK. The survey collects information about all members of these example households every year. So the same individuals are interviewed repeatedly. Even if they move, they're followed as long as they are within the UK. Now new members who join these households are also interviewed for as long as they live with the sample members. So that is why it is a household longitudinal survey of UK. Now what questions do we ask? So some questions are asked every year, while other questions are asked at regular intervals of two, three or four years. But this allows researchers like you to understand how the lives of UK residents change over time and across their life course. But questions about respondents' past background, which will not change over time, like the country of birth or date of birth, are asked only once. Finally, there are some questions which are triggered by certain events. So for example, if a respondent says that they have a child of the age of three, then certain questions get triggered about that three-year-old child. Now so specifically what type of questions are asked? All those who are 16 or above in these household households are considered to be adults for survey purposes, and they are eligible for what we call adult interviews every year. And they are asked questions about almost every aspect of their lives. So in addition to information about age, sex, ethnic group, culture, but they're asked about their family, their partnerships, children, family background, education, employment, income, health, and well-being, attitudes, and so on. So if you can think of an area, we have covered it. But those who are 10 to 15-year-olds who are young people or adolescents, now they are also interviewed, but they get a much shorter questionnaire, and these are self-completion questionnaire, which they complete. And these questions deal with issues that are relevant to this age group, so about computer and social media usage, relationship with their family members, friends, dating behavior, health and happiness, bullying behavior. One thing to note is we also collected information about zero to nine-year-olds, but they are not directly interviewed in the survey. Information about them is collected from their parents or guardians. Now in addition to the data that is collected directly from respondents as part of the survey, there is additional data that you can match onto the survey data. One is at one point in time, after the second and third wave, noses were sent to respondents home for respondents who had given consent. Data was directly collected from respondents about their health and biomarkers, such as height, weight, group strength, weight, circumference, blood pressure and so on. If you are interested in using that data, this is the link you should follow. Interviews also provide data about the quality of the interview and the interview process. Additionally, we provide information about respondents' residential location, so like the LSOA or the parliamentary constituency, and using that, you can then link to geography field-based data sets like the census, electoral information and so on. Similarly, information about school locators are also provided, which you can then link to school data sets. Finally, the national people database has been linked to individuals in our survey who gave us consent and for whom we can match. All these additionally linked data sets, if you need more information about that, you can follow this link. But in general, if you want to know more about the survey, you can go to our website. And here you can read about research that others have conducted using this study on different topics. You can read the user guide, the FAQs, questionnaires. You can search for variables. But if you still need more information, you can ask us. You can access our online training as well as attend our interactive virtual training sessions. And you can watch webinars like this, which could be recorded and various training videos. So how to find out what questions are asked about the language? Now the easiest way to do this is to follow this link. And if you do that, it will take you to a page like this unbearable search. In there, you can type language and it will bring up any variable which has a word language in its label, in its name, in the question text that it leads to. So it will basically bring up all language-related variables. Now there are three purposes to collect language data in servers. One is to measure language proficiency on a particular language. Second is to identify language-based ethnogutual groups. And the third is to know about the survey process. Like which languages was the respondent answering in? Was the interview translated? And so on. So the first set of questions that we have are about language proficiency. The respondents are asked about English language, whether that is their first language. If they say no, then they are asked about the proficiency of English but in various contexts. So when conducting data activities, when talking over the phone, when reading formal documents and letters and so on. Now this question was asked for the first time in Wave 1, the series of questions. And then they are asked at four-year intervals. Similar questions are also asked on British language proficiency. The second type of questions is the one which could be used for measurement of ethnogutual groups. So in the second wave of the study, respondents were asked what was the main language spoken at home during their childhood? And they could choose from a very long list of languages. Now as this is something that is not going to change over time, this is only asked once. And after Wave 2, only new entrants to the survey get asked this question. Now what we found, as expected, that 92% of respondents in Wave 2 said that English was the main language spoken at home during childhood. Now this bar is not, I've not shown this bar because then all the other language bars would be so small you would be able to see it. But as you can see, English is followed by Punjabi, Bengali, Bidrathi and so on. And one thing to note is that English language spoken at home during childhood is not identical to Michael's status. In fact, 38% of migrants said that they spoke English at home during their childhood. Another point to note is that the language spoken at home is also not identical to their self-reported ethnic group. So by 92% reported English as the main language or 88% reported that the group has white British. And as I said, the third reason for asking language related questions is to understand the interview process. So there's a series of questions where we ask, where we record whether the interview was translated from English. So we've allowed translations into nine languages. If that was not one of the languages that the interview was translated in and the respondents wanted to answer in a different language than sometimes someone in the household translated for them. So all that information is recorded in the different variables which you can find the way I showed you and so forth. Now, there are a couple of other questions where language appears. So questions related to harassment, discrimination and identity also have a language component. For example, there's a series of questions which ask respondents whether in the last 12 months they were physically or verbally abused or attacked. If they say yes, they're asked why they thought that happened. And one of these options was because of their language or accent. So if you wanted to know what kind of research has already been done using this particular data, you can go to our publications page which will look like this and then if you type on type language search it will show you the list of publications where language was used using understanding society data. But one thing you will notice right away is that unlike other topics, only 11 items were found. Basically that's because this is a very underused part of our survey, this data. Most of the papers that have used this data have only used English as first language or English language proficiency variables. And most of them have looked at how English language proficiency affected ethnic minorities or migrant groups, labor market performance or economic outcomes. So the point that I wanted to make was that this is a really underused part of the survey which means there's a lot of scope to do new research using this data. And that is the end of my talk. Thank you. Keep in touch with us. You can sign up for our newsletter, follow us on Twitter and Facebook and also visit our YouTube channel where we have some of these webinars available and various training videos where you can know more about the survey. That's it. Thank you. Okay. Thank you, Alita. So I'm going to talk about questions about language that are in the census. And then after that we'll have our presentation from John Maimer that's about the census longitudinal studies. So I'm going to concentrate on the parts of the census other than the longitudinal studies. So by way of background, I saw this tweet recently from TCHQ and they said that there were 42 languages spoken across their workforce. And I thought that was quite a good place to start because it kind of prompts the question of whether this is higher or lower or about the right amount that we might expect people to be speaking in a larger employer. And what do they mean by languages spoken? Do they mean that that's capability they have or first languages that people have and so on? And a related question that arose from thinking about that is this one, where is the most linguistically diverse place in the UK? And there are obviously sort of follow-on questions from that. What do we mean by this? What data can be used to answer this and how easy is it to find an answer? And I want to concentrate on what data can be used to answer this and what can we learn from the census about linguistically diverse places? So the census questions, the important thing to note is that they were different in different parts of the UK. There have been long-standing questions on traditional languages in census on the use of Welsh, of Scottish Gaelic, of Irish and of Ulster Scots. And Jemima's presentation is going to pick up on some of that. In 2011, there were questions on uses of other languages. And this includes the main language used at home or other languages used at home. And for people who indicated that their main language wasn't English, a question similar to the one in understanding society about proficiency in English. And on this slide, I'm showing the questions that were asked in the 2011 census on main language. And you can see that the questions asked in England and in Northern Ireland and in Wales were all the same. There was a question about whether your main language was English or in Wales, whether your main language was Welsh or English. And then if it was something else you were asked to write that language in. In Scotland, a different question was asked. In Scotland, the question was, do you use a language other than English at home? And this doesn't say that it's your main language. It just says that it's an other language. So we might suspect that the results there will be slightly different because it's asking a different question. So how do we find data about lists? Well, I used infuse, which is one of the tools provided by UK Data Service. I think I'll show a URL later on, but it's census.ukdataservice.ac.uk that you need to go to. And then you can browse a wide variety of census data. As I mentioned, the questions in England and Wales and Northern Ireland were different to the question in Scotland. And in browsing for data in infuse, I was able to find a language or add a table on main language used for which results are available in England and Northern Ireland and Wales. So I've just shown some screenshots on this slide of the process of selecting that table and selecting the cells within that table that I want to use and adding them to my planned output in the infuse tool. I had to do a second run for data for Scotland, reflecting the fact that a different question was asked in Scotland. And in both cases in infuse, one selects the variables of interest, the places for which I want to tabulate those results, and then I can download a CSV file and work with it on my own computer. The results that are available vary quite a lot in terms of their detail. So the table on this slide show how many different languages are recognised in the results I was able to get from infuse. So for England and Wales, and I was looking for results at ward level, there were 92 different languages that were tabulated that people may or may not use. In Scotland, there were just four in Northern Ireland, 13. However, looking around on the Scotland Census website, I was able to find some additional tables as well. And the second of the one tables I've mentioned on this slide is the one I've used for my analysis. It's showing languages used, other languages used at home in Scotland by data zone. And in that case, there were 17 different languages recognised in the results. That's a lot less than the 92 in the data for England and Wales, but it is much more detailed than the original four languages. So using those, I was able to join together various results. So tables in infuse for different parts of the UK, tables that I got from the ScotlandCensus.gov.uk for more detailed results in Scotland. I put them all together and I mapped them using boundaries from borders.ukdataservice.ac.uk, and I managed to bring up a map like this. And this shows the number of languages spoken by at least 1% of the ward population in each ward. I used that as a fairly arbitrary cut-off. I just wanted to find languages used by more than one or two people. I wanted to find languages used by some reasonable number of people. And the thing to notice about this map is that the results look very different in Scotland to the rest of the UK. And I think that's to do with the fact that a different question was asked in Scotland and the results are therefore not necessarily comparable. Again, a reminder that in Northern Ireland, we had the same question as that used in England and Wales, but the range of result categories was a bit smaller. And this allows us to explore the question I posed at the beginning about linguistic diversity in the UK. Looking at the UK level map, we can see that there's an obvious cluster in London. And this map zooms in on London and shows the number of languages spoken in wards in different parts of London. But to answer the question about where was the most linguistically diverse place, if that concept really makes much sense. Well, the answer is that the city ward in Bradford was the ward with the largest number of different languages spoken by at least 1% of the population of the ward. Okay, so those results were using the area statistics in the census. I also want to show briefly a set of results from the census microdata. There are a number of different microdata files. Some are secure and there are strong access conditions on those. Some are safeguarded and they're available for use by academic researchers with much greater ease. The safeguarded files are less detailed than the secure data files. So a brief summary here of the census microdata files are available. For 2011, there are files about individuals and files about households. The households data is only available in the secure setting. But we have safeguarded data for 5% samples of the population of England and Wales and of Scotland and of Northern Ireland. If we look at the coding of those that's shown in this table, for the safeguarded data there are nine different languages that we can see in England and Wales, 11 in Scotland and 13 in Northern Ireland. Were you to use the secure data files then as a very, very wide range of different languages that are recorded? As I mentioned earlier on there's a question very similar to the one in understanding society about the proficiency of English for people who indicated that their main language wasn't English. And what we find of course at a national level from the results of that question is that for almost all languages, apart from a few with very small numbers of speakers, the vast majority of people who say that their main language is whatever the language is, they also report good English proficiency. One of the things we can do with microdata of course is to produce tabulations and cross tabulations that aren't available in the normal area statistics. So this graph shows the results for the question, how well can you speak English? Cross tabulated by the year of most recent arrival in the UK for people in England and Wales who were not born in the UK. So in this graph the bluish parts of the bar over on the right-hand side responses saying that people did not speak English well. The orange parts of the bar are people who said that they did speak English well. And those coloured parts are for people who said their main language wasn't English. The grey parts of the graph are for people who said that their main language was English. So what we see is even for the most recent arrivals in the UK, the vast majority of people say that they speak English well or very well or that their main language is English. As we look at older parts of the graph moving further down it, we see that fewer and fewer people report problems speaking English, but also fewer people say that their English is not their main language. The grey part of the graph is getting bigger, the further back we look, more and more people, as we look back in time, their main language in 2011 was English. Okay, so that brings me to the end of the slides on languages, data about languages and the census. What we're now going to try and do is play a set of slides, that Trimma is stopped and recorded for us. We're going to skip over the description of what the LSS are, they're available in other webinars, and we're going to look at some of the questions that have been asked about use of different languages in Celsius in the longitudinal studies, in particular looking at questions about native language. Okay, so this is a summary for all of the parts of the UK. And we can see that for questions about Welsh, they've been asked in the longitudinal study, we have data about them in the longitudinal study from 1971 onwards, there's data about Scottish Gaelic from 1981 onwards in the Scottish LS, and data about use of Irish languages in the Northern Ireland longitudinal study from 1991 onwards. Over on the right hand side, we can see that as well as the questions about native language use, there's also those questions I mentioned in the earlier slides about use of language. So now we're seeing a question from the census form in England and Wales, showing that there's a question about people's capability with English, sorry, with Welsh. Can you understand Spanish and Welsh, speak Welsh, read Welsh, write Welsh, or none of the above? The census form was, of course, available in Welsh as well for Welsh speakers. It's a similar sort of question in the census form in Scotland, although you can see that the question has been structured in a different way. And there's questions about whether or not you can understand, speak, read, or write English, Scottish Gaelic, or Scots. Again, we see the questions that we saw before on how Welsh people can speak English, and the question I've shared before as well about whether you use a main language other than English at home. So in order to demonstrate how the Welsh question can be used, what Shemima has done is taken some research that I did with Nicola Shelton looking at the Welsh Government policy of a million Welsh speakers by 2050 and looking at how we can use the Welsh language questions in the LS from 1971 onwards to try and explore how many people are able to speak Welsh and whether they retain that ability to speak Welsh as we look at multiple censuses 10 years apart. And we focused here on two census points 2001 and 2011. In that piece of work, we looked at multiple Welsh language capabilities, the ability to speak, to read, and to write Welsh. And we looked at whether people had one or more of those capabilities. And we looked at that both in 2001 and in 2011. And we divided people three ways as shown on the table on the screen. People who retained an ability, people who gained an ability between two points. At the beginning, they couldn't speak Welsh at the second point they could, or people who indicated the loss of an ability over that 10-year period. So this graph shows the odds for gaining Welsh language capability of speaking or reading or writing over time. And we can see at the bottom part of this graph quite a strong variation by the number of co-resident speakers in the household. The more co-resident speakers, the co-resident Welsh speakers that you live with, the more likely you are to gain a capability to read, speak, or write Welsh. These are the odds ratios for attaining Welsh language. So people had one or more capability at the beginning of the decade, and they still had one or more capability at the end of the decade. It's possible that that capability could have changed mode, they could have changed from reading to writing, but I think that's fairly unlikely. So we see an increase with age, although obviously the error bars overlap quite strongly for some of those periods. So after adjusting for socioeconomic status, the odds for gaining Welsh language compared with not gaining it were raised for women compared to men, for people with qualifications compared to those with no qualifications, and for people who lived with three or more co-resident Welsh speakers. When we say three co-resident Welsh speakers, we're talking about other people in the same household. The odds were lower for people who were separated or divorced compared to the never married or partnered, lower for the married compared to the never married. Using the LS, we are able to look at people who were living in English who indicated that Welsh is their main language. And there's a slight oddity in the census data in this regard. If you live in England, you were asked whether your main language was English or other, and you could write Welsh in the other part of that. If you lived in Wales, you were asked whether your main language was English or Welsh. And so you couldn't easily on that question indicate that you spoke Welsh primarily. Some people did write Welsh in as the other response, and that was recorded in the LS data. We concluded that many people in English reported our ability to have Welsh as their main language, and this was more likely if they were previously resident in Wales for at least two censuses. We were able to look at how many times people had previously been in Wales for a census prior to being resident in England. And we thought in terms of policy, it would be easier to achieve that Welsh-speaking goal if we include Welsh speakers in England as well as the Welsh speakers in Wales. And of course Welsh speakers in England may move back to Wales over time. To show you here in the slides, some examples of previous studies using language that use LS data. So differential factors, grabbing geographic variation in mortality in Scotland versus England and Wales, ethnic identification amongst immigrants and their descendants across multiple generations, ethnic migration and mobility, some work on immigration and language spoken, and on neighbourhood and social integration. If you want to use the LS data, and of course it's not just the LS in England and Wales, it's the LS in Scotland and in Northern Ireland as well, you can go to calls.ac.uk which has information about all three studies.