 Good afternoon everyone and welcome to this webinar which is an introduction to the harmonized BHPS and understanding society data My name is Simon Parker. I'm based at the UK data service and the user support team and I'm joined here with Laura Fumigali from the Institute for Social and Economic Research and I'd like to say thank you to Deborah Wilshire for preparing these slides. Okay, so in this webinar we're going to be looking at the harmonization project for the BHPS British Household Panel Survey and understanding society. Laura's going to be talking to you about that. I'm going to then speak to you about how you access this data through the UK data service, how you find support, where you find the documentation and further help that you might need. We will have time for some questions at the end. You'll notice on the right-hand side of your screen. You should have a questions box If you type in your questions there, we'll do our best to answer them. Of course, there are quite a few people joining us today. If we don't have time to answer your question within the time, we'll email you an answer. So a quick introduction. The British Household Panel Survey started in 1991. There are 18 waves taking us up to 2009 and really it's a research resource which looks at behavioural and social changes and economic changes in households and individuals across this period. This was then followed up with understanding society, which has a slightly wider range of topics covered and a much larger sample. There's approximately 40,000 households have been involved in understanding society, making it one of the largest surveys of its type in the world. Now these are both multi-topic household panel surveys. So they're following the same households across the period of time and as obviously multi-topic, it's covering a wide range of issues. So you can actually use this data for many different research questions. When the BHPS came to an end, the members were invited to join the understanding society study and they did so from wave two of the study. Approximately 6,007 households agreed to continue as part of the understanding society cohort out of around 8,000 households who were at the BHPS in the last wave. Now these studies are designed to be used together and there has actually been a harmonisation project undertaken in order to facilitate research which is combining the studies. So for the households who are part of the BHPS and joined understanding society, you should be able to get around 25 to 26 years of data for those households. Now I'm going to hand you over to my colleague Laura who's going to talk to you in much more detail about the harmonisation project. Okay, good afternoon. It's afternoon. I'm going to tell you about what the BHPS understanding society harmonisation project is. It's about the principles, the consequences of those principles, a bit of the research which has been carried out and about the specific problem we encounter in doing the harmonisation and how we solve them. So Simon has already introduced, well, very well, the survey. So basically the harmonisation is a revised version of the BHPS for date one and from wave one to wave 18 and is designed to be used together with understanding society wave one to eight. And this is, so all together, there is an alternate data set covers as Simon said, more than 25 years. So what we did is like more defining the understanding society, the BHPS rather to match understanding society rather than more defining understanding society itself. We did make some little changes to understanding society to make it easier to work with BHPS, but it's basically BHPS which has been changed. So compared to the standalone BHPS data set, the new harmonised data sets, the BHPS data set has a few differences. We will go through them later. So there is quite a bit of research using understanding society and BHPS together and that was there also before the release of the harmonised the BHPS understanding society data set. And this research has three common characteristics. The first one is is quite new because understanding society is quite new in itself. The second one is quite cut in edge and the third one is interdisciplinary. But so these are all from a point of view very positive things, but there is also a quite negative thing which is that it was extremely time-consuming. It was possible obviously, but it was quite time-consuming before the release of this harmonised data set. So with this harmonised data set, we really want users to find it easier to use these two data sets together, which are designed to be used together, but yeah, it was not that easy in the past. So I do think that this type of data set opens, for the HIT Scalactacy, opens a lot of new opportunities and excited opportunities for research. In particular, a kind of listed eight big type of topics that people have addressed and studied thanks to the use of the combined use of BHPS and understanding society. So for example, a very kind of topical type of research is studying narrowhood effects. Since narrowhood characteristics change slowly over time, we in general to do this type of, to carry out this type of research, we need a few years. So you cannot use longitudinal information of just two or three years and 25 years. It can become already a good number of years to carry out this type of research. So here I collected a couple of papers on this type of road topic, mainly study anti-diversity on value stuff, what is it toward immigration and satisfaction about the neighborhood. Another topic which can be studied and should be studied using this type of data sets is studying relationships which may vary with the economic cycle. So we want to have quite a few different points in time to be able to capture different points in the economic cycle. And here there is quite a lot of research about the labor market impact of living education and non-employment, the relation between unemployment age, the scarlet effect of unemployment in the economic cycle, and also the role of saving and the characteristic of saving behavior given the economic cycle. And all these have used these two data sets together. We may want to study the long-run impact of behaviors and policies. For example, this paper studies the impact of saving behavior as a child on saving in adulthood. So we want to be able to follow a lot of people from childhood to adulthood and obviously in this case we can use the two data sets combined. We may want to study cumulative effects here. So for example, the income health gradients when income is measured over the life course. Another quite interesting kind of opportunity opened by this data set is studying like carry out normal panel data analysis, but comparing England with other countries, very relevant countries with a similar data set and long panel data. And mainly these countries are Germany with the German SOP, Australia with HILDA. So these are the main comparable countries to the UK. And there is quite a bit of research making this comparison, studying as you can see within couple inequalities in women's labor market outcome, house ownership and well-being, transition from parental home to home ownership in say Britain and Germany, job polarization in UK and Germany. It is very interesting research because you have also you can compare the phenomenon in different countries. You may want to also study long run trends and here there is a lot, especially in demography. For example, we want to study demography, what's happening in demography say from the 80s, the end of the 80s to the 2000s. We want to study social mobility in England, transition into adulthood, labor market segregation in decades. Or, and this is I think particularly exciting, you can carry out the search on epigenetic. Why it is important to use understanding society of BHPS? Because the epigenetic data have been produced for the BHPS subsample. So for these people, you have these people covered from whenever they enter BHPS until now. And also you have genetic and epigenetic data. And that is, for example, an example of this type of research is this paper, which is arena at the moment, so on socioeconomic position and accelerated DNA methylation age. Finally, you may want to study subsamples, relatively small subsample of the population where it's easy and you want to have more waves of data. So you have also more sample size, you can observe more like changing phenomena. And for example, these are two examples on adolescents and retired people. So how does this machine work? So basically, when we decided how to harmonize the BHPS and understand the society, we thought it was good to follow a valuable first approach. This is not the only possible approach. This is the approach we thought it was best. So basically what we did, we first searched for potential matches between one or a few BHPS variables and a understanding society variable. Then we assessed the comparability of these two variables. And then only variables meeting a preset standard of comparability, for example, they have the same or highly similar question warden and routing have been considered harmonizable. So this general principle has quite a few consequences. The first consequence is that this is an ongoing project. So the number of harmonized variables is likely to increase over time and will increase and is already increasing over time because new matches can be found, made possible, users can just send us information and request and we try and accommodate them. But we will, in spite of this kind of change over time of the data set, we will always try to keep the quality of the harmonization roughly constant. So we don't want to harmonize everything is possible to everything we can kind of harmonize, but we want to have good quality information, harmonize information. We have, this is for this reason, we have quite a restrictive criteria for harmonization. And for this reason, we have quite minimal data recording because we don't attempt quite like known structure type of harmonization. And also valuable denaming is also minimal. And when possible, we keep the same valuable roots as in BHPS. So for those people who are used to use BHPS, the standalone one, it shouldn't be that difficult to switch to this harmonized data set. And then we release harmonized and non-harmonized in the same file. This is different from, for example, our harmonization project where just the harmonized data set, the harmonized variables are released. We release everything, harmonized and non-harmonized in the same file as in the, in the original data set. So we will have, we have BY, BW, sorry, in Dresb, in Sump, HHSemp, HHSresb, as in the normal BHPS data set, but the new data sets includes also harmonized variables. So at the moment, so we have, we had the first release with Wave 7 last November. And these includes the core of the harmonized data set. So at the moment, the structure of the ego out, for example, is harmonized. What does it mean? So it means that the original structure of BHPS was recorded, was, had the, as relationship, the relationship coded as alter to ego. While understanding society was coding the relationship as ego to alter. So basically, we just mirror the relationships, which we kept the relationship, which don't change. So spouse is the spouse, partner is the partner. But we mirror the other relationship, like aunts and uncle become nephew and niece, natural parents become natural son and daughter. So at the moment, we have a relationship underscore BH, which is very similar to the relationship D underscore DV in understanding society. Although they're not exactly the same, but in most cases, they can use as, as harmonized variable. In the future, we all, our idea is provide, is providing the exact relationship did underscore DV also in the harmonized data set. But this is for future release. Then we have X wave dots. We have a unique file for BHPS in understanding society, which can be used for both. I think this is very useful. Then we harmonize in dress been some in dole H address income and youth. And in these we harmonize valuable naming conventions, valuable names, valuable response categories. And then we have valuable existing understanding society only that can be derived with the HPS data by combining two or more violence. So in harmonizing the data set, we had to cope with a few problems. So the first and some of them were like more like trickier than ours. So the first, the first problem, the first fact was that we had different name convention. So we try to harmonize as much as we could the naming convention between the two data set. For example, the data, the variable names and data set names in BHPS had the letter for the wave, while the understanding society had the letter for the wave and an underscore. We use now the understanding society waving convention, but we naming convention, but we have the letter B to the number of the wave to the letter indicating the wave. So we have BW, which means BHPS plus the letter indicating the wave and then underscore. And this is applied to both data sets and valuable name. Then in some cases, valuable with the same content of different names. So you couldn't really append them easily and kind of use them as a single variable. So we would just, we gave the BHPS variables the understanding society names. So for example, look at this example, the first, the first question was had more or less the same name, but in BHPS, it was using letters, while in understanding society was used a set of numbers plus the wave, different wave naming convention. And we just renamed the variables to mention the understanding society ones. But also there were cases where the two names were completely different. And also in this case, so it was even more difficult for users to identify to realize that the two variables were in fact the same variables. Now it should be much easier because the two variables have exactly the same name and the same naming convention. In this case, the original names of BHPS variables disappear. But we do think that the commentation can help you kind of navigate between these two, the old name and the new name. Then we had variables with the same name with different content of coding frame. This was, I think, even more dangerous for user because users could have appended the data set and used them as the same variable when in fact it was not. So to differentiate those, we gave the BHPS variables the suffix underscore BH. So you can make sure the two variables are not treated as the same variable. And then when possible, we created new understanding society type variable by recording the BHPS variables. So for example, for this first set of cases, we were not able to create an understanding society equivalent. So we just renamed the BHPS variable, such if you cannot be confused with the understanding society variable. While in the second group here, for example, GBSECT, we, GBSECT in BHPS and understanding society were not the same variable. But it was possible by recording the original BHPS GBSECT to create an understanding society type GBSECT. So we kept the original BHPS variable with the suffix, sorry, BH, and then we recorded the BHPS variable to create a variable which is the same as the variable in the GBSECT variable in this case, in understanding society. The last group of kind of intervention regards batteries of question, and we have quite a few cases where some batteries of question were just partially carried over into understanding society. So we want people to make sure to be able to use the whole battery just in BHPS, but we also want people to make to use the questions, the part of the question which exists in both survey for both, so for all the years. So what we did, we kept the original battery of question and the questions carried over into understanding society are duplicated and are given the understanding society name. So for example, this is in this battery and best, so just three variable were carried over into understanding society, so those were duplicated. So it's possible for everybody to use the original full battery of questions, but it's also possible for users to use the three variables which have been carried over together with the understanding society variables. Yeah, this is about EATS for the characteristics. So as I told you, the harmonized project is an ongoing project, so we have quite, we have a short mid and long term plan for improving the datasets. So it is a new release that we call 7.1 because it's our first between-wave release, which is going to happen in June 2018, so quite soon. And we are going to release HHSAMP, a harmonized version of HHSAMP, which was not harmonized in the Wave 7 release. We will make, we will correct some mistake that we spotted or also thanks to some users in Wave 7, in the Wave 7 release, for example, we remove some cases where the suffix underscore bh was used in a known a straightforward manner. We will going to add more individual questionnaire variables and quite a few more derived variables, because I'm working on this at the moment. In the future, which can be near future, for example, Wave 8 in November, we plan to harmonize pointers and identifiers. This could be also done by Wave 7.1, but I'm not sure, completely share the moment. We'll hope to have a full harmonization of ego out as soon as possible. More harmonization of income variables, for example, the derivation on net income at the moment is not fully harmonized. We are going to tackle this problem soon. We want to harmonize labels. We want to add more and more derived variable. And in this more long-term, long-run project, we want to add more value-added data set, for example, the partnership histories, which at the moment exist for separate file. We would like to turn it into a full harmonized file. That is a documentation, which I think is quite useful. It will also be updated. So if you are starting using the data set, you can start looking at it and explain what the data sets, the harmonized data sets can be used for, and also why specific, a set of specific characteristics and how specific variables and data sets have been treated. There is also a new website that probably, I don't think we can open it from here, but we can try and open it later. And this is a new data set, a new website where information on the frequencies, question-warding, question-routine is available for both understanding society and BHPS waves, and this is quite useful, I think. This project is a very collaborative project, and we had a set of beta users trialing the data sets in July 2017. This was very useful. So we really welcome suggestions from users. So you can help in many ways. You can provide feedback on the current state of the project, or you can just let us know the variables you have harmonized before. Some of them are not probably going to meet our standard of harmonization. Some of them will probably meet them. So if you have done some research, please get in touch. And if you see the some variables, you think that there are some variables which could be harmonized and we haven't released them yet, please contact us. Or you can just say, look, you know, I have this data I'm working on. I do think there is more scope for harmonization. I don't know exactly the variables, but I think this is very important and it can be a kind of a useful direction to look at. Please tell us. Or you can suggest improvements in the documentation or we can just, you can want to collaborate with us in both substantive or methodological work exploiting the feature of these data sets or on the harmonization itself. So please get in touch if you have any of these questions and requests. You can send an email to the understanding society email or you can just contact me at the lfumac at sa.ac.uk. Okay, so thank you very much, Laura, for talking us through the project. If you have any questions about things Laura has spoken about, on the webinar the go-to webinar sidebar, you should find a section there called questions and if you post them in there we will have a chance to have a look at them at the end of the presentation. So I'm just going to talk briefly about the accessing the data and further resources. Now the understanding society data can be accessed via the UK data service. In order to do so you will need to be a registered user. Registration is free and that will give you access to what we call our end user license version of the data. If you want data which has slightly lower levels of geography, end user license data will only go down to government office region. You can apply for special license and if you're based at a institute for higher education in the UK you could potentially apply for the secure access version of the data which contains the lowest levels of geography and some other more sensitive variables. Through our website you'll be able to find also a range of useful documentation and resources. I'm just going to quickly show you the website. One second. So here we have the catalog record page for the understanding society waves one to seven and the harmonized BHPS data. If you're looking for it on our website it has the study number 6614 and the page looks something like this. Now as you can see we have some detailed information so you can see the title of the study obviously. You can see the series that it's part of and as we scroll down the page we can find some other important information. Here we have the citation for the study and we do request that if you're using the data from us that you do cite the data correctly as you would with any source of reference. This way the data owner can demonstrate their impact as well. As we scroll down there's further information that really helps us to understand what this study contains. So we have an abstract outlines what the the aims of the projects are and other information about the collection of it. There's also guidance on where you can find the special license and secure access versions and what the differences will be between the data sets. The coverage in the university methodology section gives you some guidance about the actual type of data and we have these for all of the studies within our archive. So we can see here that for spatial units as this is the end user license version we have countries available. We also have government office regions. We can see that with number of units there are over 40,000 households included in wave one. The documentation will give you a breakdown on numbers and response rates. Further down the page is probably the most important section particularly when you're starting working with a data set. We have a documentation section. You'll find these on all the catalogue records for data at the UK data service and here we provide as much of the information and documentation for the study that we can for you to actually download and look at before you begin working with the data. So as we can see we have a user manual for the BHPS. We also have field work documentations and questionnaires for each of the waves. We have some study information and citation information with a read file just down at the bottom and there we include information about any of the processes that we've carried out on the data with quality assurances when we've looked at the data as it's coming to us. And through this you can really find information that would help you understand the data and would strongly suggest that whenever you begin working with the data set you have a look at the user guides and the questionnaires and things just to give you that really good grounding in what's going to be in the data. There's also related studies and guides. So we can see here that some of the related studies include the waves due to three nurse health assessment, the innovation panel where the methodology for the understanding society is developed and other data sets which may deal with similar issues. Also we have a number of case studies for this particular study and in the case studies what you will find is research which has been conducted and using these data sets to address certain questions. So for example we can see there's one there what predicts our level of well-being whereby the authors have used the data in order to look at well-being. As you work with the data you may start to produce syntax files. If you have done so and created some syntax files which you think may be of use to other users do feel free to upload them. You can do that here. Don't worry they don't need to be sort of textbook perfect. As long as they're functional that's more than sufficient and you will also receive the citation if anyone is downloading the data. So as you can see there we do list the authors for the studies. Now just return back to my slides. You may also want to look at the understanding society page. It's one of the better ones I would say not biased at all. The documentation provided online really is fantastic. You get very detailed information in terms of user guides and dataset documentation. It's all searchable and it's very very easy to use. You can see there's the web page just at the bottom there. It's understandingsociety.ic.uk slash documentation and then slash main stage and that will show you the documentation that relates to the the main survey. And as you can see on the right hand side we have access to the user guides, questionnaires, dataset documentation etc. Now if you're looking for help with the understanding society data you can use the user support forum which you can access by going to data and documentation and then selecting the highlighted user support forum. All you will need to do is very quickly register in order to post an issue. So if there's a problem that you're having with the data you can ask a question in there and someone from the understanding society team will get back to you with an answer. It may also be worth as well exploring the forum for questions say if you have a particular issue to see whether that question has already been asked. It may well be that there's something in there which will answer your question but if you can't find it do feel of course free to actually post a new question. Now we've also provided some support for using the understanding society data and we can certainly help when there's issues to do with say data quality or things in the data that perhaps are not clear. If you want to contact support at the UK Data Service you can do so through our website. You can see the link there UKDataService.ac.uk slash help and from there you'll have a list of forms that you can fill out depending on the nature of the support that you require. So there will be a form that looks at say accessing data or understanding and using data fill in the correct form and that goes off to the correct team here at UK Data Service and they'll be the people who are best placed to get back to you with an answer. We aim to have an answer or a response to your query within three working days. It may take a little bit longer if we have to go and investigate a matter or something like that but we will keep you informed before we were doing. You can also follow us using our Gisk Mail service which you can do by going on to Gisk Mail looking for the UK Data Service. We contact this mailing list once a week and we outline new data sets and updated collections in the archive so if you're waiting to see when the next wave of unsung society is going to be available if you sign up to that you'll get a weekly email with all the data sets that have been added that week or any updates we have to them. You can also use Twitter. We're at UK Data Service if you have a particular question. You can also speak to the team at ISA on the Unsung Society team at at U Society and we have a Facebook page as well as you can see there. Now we've got time for questions so if you do have any questions please do put them in the question box on the right hand side and we will certainly do our best to answer them. If we don't have time to answer your question or it's something that we have to go away with we will email you with an answer afterwards. As mentioned before this webinar will be available on our YouTube channel within a few working days so if you look for us on YouTube our channel is UK Data Service you'll be able to revisit this webinar and listen to it as much as you'd like. Okay so well thank you very much for joining us and thank you very much to Laura for joining me. Goodbye everyone.