 Hello, and welcome to this webinar organized by CESTA. Our topic today is data for research into aging, and I'm Jen Buckley, and I work for the UK Data Service, and I'll be joined by Stefan Gruber, who works on managing the shared database, Siobhan Leahy from the Irish Longitudinal Study of Aging, and Tristan Phillips, Manager of Gateway to Global Aging Data, which is based at the University of Southern California. Okay, and here is an overview of the webinar, so I'm going to start with a very brief bit of background information about CESTA and Social Science Data Services, and a very quick overview of some of the available data. We'll then move on to the main part of the webinar. So first we, Stefan will discuss shared, the Survey of Health, Aging and Retirement in Europe. We'll then have Siobhan discussing the Irish Longitudinal Study of Aging, and I'm just going to talk about Gateway to Global Aging Data. There'll be an opportunity to ask questions. You can ask questions by typing them into a questions box on the webinar control panel. This is usually to the right of the screen, and if you can't see it, you may need to maximize the control panel by clicking on a red box within our row. So feel free to type questions at any point, but we'll pick them all up and answer them at the end. We will provide a copy of the slides after the webinar, and also add a recording of the webinar to the CESTA website. So CESTA is a consortium of European Social Science Data Archives, and it aims to provide the research infrastructure to enable the research community to conduct high quality research, and key tasks underlying this session include developing standards and best practices around the management and archiving of social science data, and facilitating research access to important resources. National Data Services are the core operational bodies in CESTA. National Data Services provide access to extensive collections of data, and you may be familiar with the service in your country or some of the larger services such as GATIS or the UK Data Service. All National Data Services have websites with online catalogs for searching and browsing. On the screen is the catalogue for the Swedish National Data Service. These catalogs will allow you to search using terms such as aging or to search for particular studies, and there are usually ways to refine and sort your search. So there are many National Data Services in Europe. This image comes from the CESTA website where you can find information and links to all the National Data Services. So there's wealth of data within Europe relating to aging and its effects on individuals and society. Research infrastructure such as National Data Services provide access to data collections from large-scale projects alongside outputs from smaller research teams and individual researchers. There are major studies focused on aging and related topics including longitudinal studies, studies collecting biomarkers, and harmonisation across key studies, and dedicated cross-national projects also facilitate comparative research. And to help you navigate all the data that's available, we've put together a short information sheet. The information sheet summarises major data resources, giving details about how to find and access them. The information sheet will be available as a handout from the webinar and will also make it available on the CESTA website. Just in summary, some of the most significant studies for aging research belong to a family of surveys focused on health, aging and retirement. And these are going to be the focus of the presentations that follow. And there are also other aging-focused data resources in Europe such as the Norwegian Study on Life Course, Aging and Generation, and in Sweden the National Eating Infrastructure for Aging Research. Many of the cross-national studies are also useful for research into aging. For instance, the Generations and Gender Programme looks at family relationships across both the generations and genders. National Data Services provide access to extensive collections, so examples that can be relevant include the general social surveys, many of which include questions on topics such as health and retirement alongside other social topics. Longitudinal studies which allow analysis of change at the individual level include both household panel studies and cohort studies. And several European National Data Services also give access to qualitative data such as in-depth interview, transcripts, field notes and answers to open-ended questions. If wanting to know more about National Data Services, we also have a guide that introduces how to find and access data from National Data Services and it includes a number of key questions and summary information about individual data services. So at this point I'll pass you over to Stefan who will discuss the share project with us. Yes, I'm talking about the share study, the Survey of Health, Aging and Retirement in Europe. And yeah, first of all, why was share started? I think we all know that population aging is one of the main challenges of the future. Here on the right hand side you can see the age structure of the EU 15 and it's the development until the year 2050 and you can see a clear shift towards the older age cohorts. Although this is known for quite a while, there was insufficient information to understand, well, first of all, the living conditions of older people and what are they influenced by and also how state policies influence these living conditions of elderly people. And this is why share was started in 2004. We explore the European natural laboratory. What do I mean by that? We have up to now 21 countries in share and of course all these countries are differently affected by population aging and they also have different political and cultural contexts. So this is why I talk about Europe as a natural laboratory. And the final objective of share is to provide policymakers with reliable information and finally turn the challenges of population aging into opportunities. In share we have three main dimensions in the focus. The first one is the economic living conditions of our respondents. Second one is health, not just physical health but also mental health, disability, psychological health, healthcare is also a very important part of this dimension. And the third one is the social network of our respondents. Of course, these three dimensions are not independent from each other. They of course influence each other and are interconnected. Furthermore, they are also influenced by the political and cultural context. And of course they are not static but dynamic. And that's why we need longitudinal information to see how the living conditions of elderly people change. So that's actually the main principles of share. In share we have a longitudinal survey. So we have a panel structure with repeated observations usually every two years. So far we released six waves of share with around 120 individuals and around 300,000 interviews. In share we have 20 European countries plus Israel and we conduct ex-anti-harmonized computer assisted personal interviews. On this slide you can see an overview of the different data collection waves. So we started in wave one in the year 2004 with 12 countries. And then you can see that with each wave some new countries join share. Let me perhaps emphasize on this slide two things. One thing is wave three which is a special wave because we did not collect usual panel information in this wave but with respect to life history data. I will talk about that later on a little bit. And the other thing I want to point to on this slide is wave seven which is in the field at the moment. And as you can see we will have a lot of new countries joining share. All in all we will be 28 countries in wave seven. And this is also very interesting content that we collected in wave seven because we do combine both questions, panel questions and retrospective life histories in this wave. Here you see an overview about the 30 interview modules that we have. So the share interview and the questionnaire is structured in different modules or contents. I don't want to go through all all the different modules right now. Just the important takeaway point here is that not every module is part of every wave. Here red mark you can see for example that the social network module was part in wave four and then asked again in wave six. So when you're interested in doing research with share data always keep that in mind that interview modules might not be part of every wave. Another important thing I want to point you at is that not every module is answered by each respondent. So we do this to save interview time and here I show you for example the household respondent what we call the household respondent. So he or she answers questions on housing, household income and household consumption on behalf of the other household members. And when you have a look at the data later on of course this will have the consequence that for respondents who live in the household and are not household respondents they of course have missing information. So here the important thing is that you always should check the questionnaire routing and should also keep in mind that we have these special respondents in share. Additionally to our usual interview modules we also offer 17 generated variable modules. Here we for example help the user by providing scales or different measures like the BMI, the body mass index. So we do a lot of work for the users here. And also very important are the weights which you will find as generated variable modules. We offer both longitudinal and cross-sectional weights and we also have imputations as a generated variable module. Okay, how are data documented? If you visit our website you will see a tab on the left hand side which says data documentation and there you will find a table with all the documentation files that you as a user need when you start working with the shared data. For the beginning the most important documentation files are the release guides. We have two release guides, one for the regular panel waves of share, wave one, two, four, five and six. And one additional release guide for the shared life for the retrospective life histories. And additionally very important are of course the questionnaires because they can see exactly the question wording and you also can see the routing which is the reason for the missing values for most of the missing values that you will find in the data. Apart from our regular shared data we also provide some so-called special data sets. Here I want to present mainly three data sets. The first one is share life. The third wave of data collection. I also want to shortly talk about the job episodes panel and easy share. So share life. As I said before, this is the third wave of data collection and here you will find retrospective life history data on health, employment, the financial situation, partner history, housing and you also will find some childhood information in this data. And we collect this data with so-called life history calendar. So the basic principle here is you can see that on this slide now that we firstly asked the most important life events. Here in this example you can see that the respondent had two children and they were born in 1975 and 1978. And then you can use these life events like birth of children and always refer to these life events which makes it easier for the respondent to remember when certain things happen in their life. Another special data set is the job episodes panel and this data set is based on the share life data. It has a focus on the job history of the respondent but also contains information on migration and family biography and very interesting in this data set is that it also contains context data on for example this statutory retirement age, early retirement age but also pension contributions and minimum and maximum pension benefits for each country and for the whole life course of our respondent. So this makes it really interesting and rich data set. And the last special data set I want to show you is EZShare. Here on this screen you see all the different modules that we provide for our users with regular share data and of course when you start working with the data set this is perhaps a little complicated. So what we did was we just generated the EZShare data set so you have one single data file and you can start working with the share data straight away. The intention of EZShare is yes to make an easy entrance for the users but we also use this data set as student training and we also offer that to other universities to use EZShare as a student training data set. EZShare has the same number of observations as the main release of share but it is restricted to some central variables. In EZShare we have a simplified data structure and well you can download EZShare and directly start with your analysis without any complicated data preparation procedures. Perhaps interesting for you might also be that the release guide also includes exemplary analysis and very useful for the share users is that we also provide the data code for you that generates EZShare and this data code can be used as an example for the generation of your own share panel data set. So I think as a starting point EZShare is a very good data set and very helpful for users. Okay and last but not least some few words about the data access. We don't have a lot of requirements the only requirement you have to fulfill is that you have a scientific affiliation and that you use our data only for scientific purposes. What you need to do is you need to sign a user statement that you can download from the share website and we also provide only individual access. So if you work in a well in a research group you should not distribute the the share data to the other members of your group. Please make sure that everyone who uses the share data also is a registered share user. Yeah and if you're interested in our newsletter we have on the left hand on the bottom you can see our email address and there you can also ask user questions. Now I will pass you over to Siobhan. Thanks Jim. My name is Siobhan and I'm a research fellow with the Irish Longitude and Study on Aging. So I'm going to present a little bit about the data structure that we use in TILDA and the accessibility of the data on public platforms. So we're quite similar to shares so a lot of what Stefan covered actually applies to TILDA as well. We're part of the HRS health and retirement study family of aging studies worldwide which includes share. So actually a lot of our questions will be either very similar to or identical to many of the questions you've included in share and other studies so we're highly comparable. TILDA began a little bit later than share and so it's a nationally representative study of the Republic of Ireland we don't include the Northern Counties or sampling is based on the Irish Geo Directory which is basically a list of all eligible addresses in the Republic of Ireland. We conduct a three-state approach and this all took place in advance of the first wave of data collection in TILDA back around 2008-2009. So the first step was to identify specific population sampling units of which there were 3,155 in the country. We then chose 640 of these clusters based maybe on two factors first thing that it would get a geographical spread wide enough to represent all corners of the country and also that it adequately represented the different socioeconomic strata within the population as well. So from these 640 clusters 40 addresses were randomly selected within each cluster so this resulted in over 25,000 addresses being generated and contacted by social interviewers and of the 25,600 houses not every house had an eligible respondent over the age of 50 but of the houses that did have eligible respondents we received a 62 percent response rate to the study and this resulted in a sample that we've won of 8,500 community dwelling older adults. So we have three forms of data collection in TILDA which makes us a little bit different to some of the other aging studies so we begin or process with a copy interview similar to share where interviewers go into the home and complete a detailed interview with the respondent this can take about one and a half hours roughly maybe a little longer for more elderly people and it includes all of the domains really that Stefan already covered in terms of social economic and health circumstances. Once a respondent has completed the copy there's also a self-completion questionnaire left with them with a stamped address envelope which they can post back to us later on and the purpose of the self-completion questionnaire is to cover some of the more sensitive material maybe around relationships etc that people might not be as comfortable answering in person to an interviewer so it's at the discretion of the respondent whether they want to complete this questionnaire they're encouraged to do so and then post it back to us. Everybody who completes an interview is offered the opportunity to do a health assessment and this is probably the largest and most unique component of TILDA in that it's quite a detailed health assessment and it takes place either in a dedicated health centre or in the respondent's own home if they don't wish to drop. It can take up to three hours to complete in the health centre but it covers a huge range of tests including some very advanced cardiovascular vision, gait and cognitive aspects as well. So the design of TILDA is similar to share in that we complete waves approximately every two years. Today we've completed four waves and are planning to go into the field with wave five at the start of 2018 so if you look at the graphic on screen you can see in wave one which was collected from late 2009 to early 2011. This included the CAPI, the SQ and health assessment and this was similar to wave three so the current model is that every second wave is a health wave where we collect the most detailed information and the intervening waves then are restricted just to CAPI and self-completion questionnaire. So this is a summary of the response rates to date in TILDA. If you look at the second column wave one of the 8,500 people who completed a CAPI at wave one 85% of this sample also completed an SQ and 72% attended a health assessment and roughly 80% of those who attended the health assessment did so at the health center and the remainder had an assessment in their home which was a modified assessment and didn't contain all of the tests. In wave two then we had a 90% follow-up rate and again 85% of people completed the SQ. Similarly at wave three and four we had an overall response rate of around 85% and with 85% of this sample then completing an SQ and this represents a very good retention rate in comparison to other studies and it's something that we work very hard on maintaining across the waves as well as the all of the TILDA data is housed here in Trinity College Dublin and that is the full unrestricted data set with all of the information collected on it. We also have publicly accessible data which is available on the Irish Social Science Data Archive which is housed in UCD in Dublin and also at the University of Michigan and our harmonized data sets are available on the Gateway to Global Aging which Dresden will talk about later and currently waves one and wave two are available for use by other scientists and similar to share we do look for an affiliation to an academic institute before releasing data and we do it on the understanding that it's not for commercial use. Currently waves one and waves two are available and wave three will be coming online in the very near future. One issue which makes the job of making our data publicly accessible is weighing up or duties in terms of data protection versus data access and essentially because we're highly representative of the overall Irish population we are very concerned and very cautious not to release any data which may be identifiable. Of the the original sample of eight and a half thousand adults that were sampled in Tilda represents so one adult represents about one adult included in Tilda represents approximately 140 adults in the population Ireland so it is very representative and we are conscious that it may be easy to identify people if we release huge suites of data over time and so we always weigh up the benefits of having publicly accessible data for high quality research versus the duty to our participants of protecting their anonymity. So conscious of this we have another a number of anonymization techniques in collaboration with the Irish Central Statistics Office to do everything we can to avoid possible identification of individuals so the first thing we always do is remove highly sensitive information for example names and addresses we also remove any potentially identifiable information this is data which an individual may be identified when it used in combination with other information about the individual there's a number of things we do to avoid this so we often group variables together so for example with the wave two release for medical conditions rather than putting in the individual conditions we group them together using the ICD-10 codes because for example if you had data from wave one and wave two where you knew somebody was 57 years old female had six children and subsequently developed a specific disease between the waves then that person could be highly identifiable so we've done we rigorously went through the data and tried to identify any possible breaches and then use text needs such as grouping top and bottom coding generating new variables or dropping anything where we feel people might be identifiable so the questionnaire domains that we include until the mirror goes off share the three main domains are health social and economic and in general in order to be comparable to the other studies we use scales and standardized measures where possible and within the questionnaire which is accessible online you can see in any cases where or questions are similar status share or the HRS it's actually documented within the questionnaire this screen looks a lot like something that Stefan presented so it's a list of all the modules within the copy questionnaire in tilde as you can see I'm not interested in going through each domain or each module but three domains are covered and each module has a module called for example the cover screen has code of CS and pH relates to physical health and cognitive function which I'll get on to in a minute as an example so we're naming conventions for the variables that you'll see in the tilde data set are based on the collection method used the section of the questionnaire that the variable comes from and the question number and the labels used within the data set are usually shortened versions of the questions that appear in the original questionnaire so this is an example of how we name variables so this particular variable comes from the copy and the question that was asked with the respondent was do you usually wear glasses or contact lenses and the number of the question of this in the questionnaire is pH 101 so the pH is what we call the section identifier so this tells us that this comes from the pH our physical health section and 101 is the question number within the physical health section and the variable then within the data set is just identical to the question and it can be asked in various forms depending on the type of interview the person is undertaking this example is slightly different in that the previous screen related to questions where there was one possible answer however a number of our questions within the copy are multiple choice possible answers so for example question pH 201 is the question has a doctor have told you that you have any of these conditions and it's a list of about 20 medical conditions where respondents are free to endorse as many of the questions as they wish or as applies to them so in this case pH again is the section identifier 201 is the question number and underscore 05 identifies the option or the loop within that question so somebody could potentially have endorsed pH 201 underscore 05 in addition to 201 underscore 03 or underscore 04 and again the variable has identified the question from the self-computing questionnaire it's very similar again this time the prefix is always SEQ and capital letters which indicates that this particular variable comes from the SEQ rather than the copy and the question identifier then is pH 1 and again the label is cut down on drinking similar to the share model we also have an extensive number of derived variables so this is where we have done significant cleaning to the roll data from tilde and we use it usually when we're using a number of questions to come up with a scale so a lot of the mental health variables in tilde will comprise a scale and we do the derivation of that so we will use the individual items to come up with the final scale and again you can identify these in the data set the difference between derived variables and roll data is that the prefix for the derived variables will always be in capital letters so for example for chronic diseases the prefix will be CHR in capital letters if we look here you can see that CHR chronic refers to the number of chronic diseases COG refers to cognitive tests so COG, MMSE is the fully scored mini mental state exam MD polypharmacy MD prefix refers to medications so this variable takes all of the individual medications that we collect and ascertain whether somebody is taking five or more medications and then MH has a is the derived variable which gives the total score for the hospital answer actually in depression scale and throughout the data set despite the different techniques we use and the different types of questions we try to use standard response options where possible so this is outside of the standard yes no answers or questions where you might have options one to four and generally they're used for questions where there is multiple responses so for example if somebody doesn't is given a list of conditions and doesn't endorse any of them they have the option to say other which is scored a 95 none or none of the above which is a 96 and specific to the cognitive tests a 97 means they're unable to carry out the tests for every single question in the data set we have a don't know or refused option and these are coded very easily as 98 minus 98 underscore 98 for don't know or 99 minus 99 underscore 99 for refused and minus ones you will see quite commonly in the data set and these are for to identify people who haven't responded to the question on the basis that they weren't routed to it not because they refused to answer the question so if you access the data through ISDA or through the University of Michigan we provide all the supporting documentation necessary to assist with your analysis so there's a design report which describes the overall sampling and planning of tilde from the outset the questionnaire for each wave is available so this includes the entirety of the questions the routing instructions and then some nuances and question wording which depend on the type of interview you're getting whether it's your first time in the study whether you're a repeat interviewee or whether somebody is completing a proxy interview on your behalf there's also release guides which go along with each version of the data set and these are updated every time we update the data set there's a derived variables code book which is very instructive and hugely useful and we also document the anonymization actions that we take in addition at each wave we produce a main finding report which covers some of the data collected specific to that wave that's it in terms of tilde but I'm happy to answer any questions that anybody has great thank you very much Vaughn and so we'll move promptly on to Dresden to hear about gateway to global aging data thank you and thank you Sivan and Stefan for making my job much easier by introducing these two studies and these two studies as Sivan mentioned are part of a network of global aging studies conducted around the world in similar ways and I'll just briefly overview what are all the studies that are being done in addition to share until this that includes the HRS in the US in Mexico there's the IMAAS in England there is also of course share in Korea there's a study called CLOSA there's also a study in Indonesia called the IFLS in Japan there's a study called JSTAR there's a multi-country study called SAGE conducted by the World Health Organization course tilde there's also a Costa Rican study called Corellus in China they have the Charles study and in India we're currently in the field for the longitude knowledge study in India or lastly so as you can imagine all of these studies are very complicated they have lots of questions lots of different modules and so we designed a central resource for people to access these all of these studies have some similar key innovations that includes their multidisciplinary and the subject matter so as we've seen they have questions about demographics, health, economics they're also usually enhanced quality economic data so all of these studies generally use unfolding bracket questions to be able to replace missingness for economic questions that sometimes occur in survey data and then most of the studies provide imputations as Stefan mentioned for SHARE to be able to fill in that missing data and they also integrate biomarkers into social surveys so these include something like blood pressure also blood specimens or health measures as tilde has some great health measures they all share core content areas which are usually divided into survey modules so demographic health health services work and employment economic status and family structure and social network so we developed the gateway the gateway to global aging to be a central place to access all of these data sets and to find information about all of these data sets the gateway is available at www.g.aging.org the gateway includes a library of all survey questionnaires flow charts which illustrate questionnaires to get pattern we built a search engine to locate specific survey items we have statistics which show interactive in-grafts and tables documentation of cross-study comparability we have a publication so you can find other research that's been done using these data sets and we built harmonized data and these harmonized data sets are a bit like easy share that Stefan mentioned in that they're not the entirety of any of the surveys but they're kind of selected measures in the surveys which are created to be comparable between different surveys and between different waves of that same survey so it's a subset of all the of all the survey data which is most frequently used and most easily harmonizable between different states so the gateway to web-origin data as I mentioned is available at g2aging.org you can see here on our first tab we have surveys you can see all the surveys as I mentioned which are included in this so you'll notice that the HRS here in the United States started in 1992 while Shara as Stefan mentioned started in 2004 and includes US and Israel for any of these surveys we can go into the survey itself so if we go into share wave one we see the different modules of share and then for instance if we went into a module called behavioral risk we can see the questions that are asked in share and you can see these in a field chart or as a list of items so we can see the first question is this ever smoked daily question for the question in share is the following questions are about smoking, leaking, alcoholic beverages have you ever smoked cigarettes cigars or cigarrillos or a pipe for a period of at least one year we let you know that the answer choices are yes and no this variable is used in a harmonized survey so it's included in the harmonized share and then we also list concurrent items so what is this item that was asked in other waves of share including to other than 2004 so survey questionnaires include the location of all survey items inside the interview as we saw how the question was asked into whom so again as Stefan mentioned it's really important to know that not all questions are asked to all people who are interviewed so for instance questions about the household itself are often just asked to one person in the household we include links to microdata variables and how those values are formatted and we've assigned some research topics keywords and domains for easier searching and we'll see that you can also see that in addition to these kind of core interview or CAPI interviews we also have all the interviews indexed for the self-completion the life history interviews the health assessments and exit interviews and then we have flow charts and if you're new to survey data understanding the survey logic or survey skip pattern is really important to understand so here in share 2006 share we see this first question of the following questions are about smoking as we saw before how do you ever smoke for a period of at least one year and if you answer yes then you get asked the question do you smoke at the present time so for you as a researcher if you want to make a variable which captures smoking at the present time you need to account for both the questions smoking at the present time and you need to account for the ever smoked daily question to be able to kind of complete a measure so these flow charts are a helpful way to visualize the survey skip pattern and let you know all the questions that you need to account for the full measures we also built the concordance and this is our search so users can identify comparable survey measures between resettlement surveys using keyword search top-level research domains for all HRS sister studies find your research topics for how many studies this allows users to compare measures between multiple waves of one study in the same year in multiple studies so for instance if we go back to our website we can click on the concordance tabs and we could search for something like glasses as we saw and we could search for tilde you can see we have the waves of tilde available here let's say we search in 2010 and once we search we can see we'll get all these questions in tilde that are related to glasses and you can see the first one is that one that Sivan mentioned which is pH 101 so you can do this for any of the survey that's many of the time or as many years at a time as is helpful for your research we also for some of these harmonized data sets you can also select for instance the harmonized share in 2004 and you could select the birth date variables or the education variables and search just for those and we'll give you those variables without kind of the keyword searching also included a lot of documentation as you can imagine all these surveys can be quite different between different waves of each individual survey and in between different surveys so we built the main specific comparison tables for a number of topics I won't read them all we've also created some domain specific user guides and these user guides are quite detailed accounts of all the questions was asked for a particular topic how they're asked in different surveys how people deal with their differences and are really great starting points all of the documentation is available on the gateway on this documentation tab at the top right here you can see here these are the summary tables for instance a cognition summary table will let you know for an immediate word recall test how many words we're using that test in different studies and then below these cognitive tables you can find links to our working paper series which again are quite long but they're very detailed we built interactive graphs and charts as we wanted to give users a way to be able to work with the data a little bit before downloading it and thinking about things like weighting data and accounting for complex survey design so for instance here you can see a graph for total family income this is using ELSA data so it represents England and here we look over three age groups you can see the distribution of wealth is different for these three age groups and as you would expect as people move to a more earnings-based income to maybe a more pension-based income or savings-based income here's an example of another graph this capture is currently working for pay and you can see this is for 2010 and this is for China, Japan, Slovenia, Sweden, England and the United States one of the things that you can see is there's some extreme variation we've limited this here to people 55 to 64 so people might reasonably expect to be working age you can see in Slovenia it's less than 25% of people who are currently working and in Sweden it's more than 75% and these are both shared countries so we know that these really the same question that was used to derive both of these is exactly the same and so you can get an idea of some of the variation that you see between different countries and it's a really great natural place to study the impact of policy and culture between different countries and here's an example of a table as opposed to a graph and this is the net value upon a residence in the United States and this is between 2000 and 2012 and one of the things that you can see here is the housing crisis in the United States where housing prices really started to fall in 2008 and have continued to fall until 2012 all of the graphs and tables you can download we just asked that you cite them we've also built a publication search as I mentioned so users can find publications based on health and retirement studies around the world which are relevant to their research focus I'll jump back over to our website briefly so on the publications tab here you can say for instance I'm interested in publications that use shared data and then for instance have something about cognition in the title and then for any of these we include a direct link or going to Google scholar to that paper if available and then you're also able to export in TXT so your in-node file any of their citations that you would like so it's a really helpful way to get started with research using this family of studies and knowing what's already out there there is some slight delay in us updating this publication search so we try to go through really every three or four months and see what's new has been published and we added them and lastly I'll just mention harmonized data files so harmonized datasets are created to provide harmonized measures of HRS type of surveys so variables are defined as similarly as possible to the RAND-HRS and the RAND-HRS is easy to use user friendly version of the HRS data which is most commonly used in the United States all the datasets are combined all waves so each individual at one record and then we use a simple variable naming function so for instance a variable named R1 work is what the respondent is currently working in wave one we also include country specific variable names so for instance a variable like R1 LBRF or labor force underscore C in the Charles data the Chinese dataset will let you know that it captures the respondent's labor force in wave one but it has a different response scale to some of the other studies and we also include spouse versions of most variables one of the advantages of harmonization is that we take care of accounting for some of the survey skip pattern here's a really easy example of how we accounted for a survey skip pattern to make a variable which captures whether the respondent smokes now of course this saves you just a small amount of time but they also get much more complicated especially with the income and wealth variables and particularly months of programming and if you're not interested in programming up every component of wealth or income for a family these can be really invaluable resources for all of our harmonized code books we include lots of documentation so each harmonized dataset is accompanied by its own code book it includes an overview of statistics for each variable we detail variable creation and any assumptions made in the creation to highlight any differences between waves for this harmonized variable and the RAND-HRS harmonized variable we always use the RAND-HRS harmonized the RAND-HRS variable is kind of our base point to compare other studies variables and we list all variables from the regained dataset using the creation so if you kind of disagree with our assumptions about creating it you can create it on your own here's an example of the code book from the harmonized share so you can see we make a smoke ever variable and a smoke now variable both for the respondent and the spouse this is waves one through five and of course as Stefan mentioned a wave three was not a panel wave we include scripted statistics you can see the tabulations you'll also notice that we use special missing codes which are available in SAS, data and SPSS to give you more information about why a value might be missing we include lots of text about how the variable was constructed and we mention differences with the RAND-HRS and then lastly we list those share variables which were used in the creation currently here all the harmonized data files which are available so that includes the harmonized ELSA which is the first seven waves of ELSA the harmonized SHARE which currently includes waves one through five of SHARE we are working on incorporating the six now harmonized JSTAR for Japan harmonized CLOSA for Korea harmonized M-HOS for Mexico harmonized Corellas for Costa Rica the harmonized Tilde for Ireland of course harmonized Charles for China and the harmonized LACI which currently just incorporates the pirate data for LACI which was conducted in 2010 in India and harmonized data files are either distributed through the gateway or by the original dataset you can find links to all of the harmonized data and other datasets on the download tab so we have links to all of the data to where you download the survey data as Stefan and Siobhan mentioned usually you have to sign up with each dataset and the data provider will provide that data for you you can also download the harmonized datasets, the code book and this data creation code which was used to create it unless you want to see exactly how these datasets were created to be able to download and use some of our more advanced features we do ask them to register on our site also you say you register G2Aging.org quite simple and we'll send you a confirmation great thank you that was very useful and okay so questions the first one I will point first of all to Stefan so someone is interested in using shared data for journalism they work at a newspaper is there any way they can access shared data and can you please repeat the end of the question so I didn't get it yes that's right to someone who is working at a newspaper interested in using shared data what are the access sort of arrangements in this case well we have but well there's no general answer to that when you want to apply for data access you have to fill in this user statement and if you're not if you don't have a scientific affiliation there's a page two of this user statement where you have to describe which project you want to use the data for and then we decide individually if we give data access or not so to the user who is interested just fill in the second page of the user statement and yeah then we decide individually okay okay and just following up from that another question related to someone who wanted to use data in a a sort of master's thesis or dissertation is that the same advice well master's thesis and dissertation is definitely a scientific purpose so I do not see any problem there okay great and could I put the same questions to Siobhan yeah or data is widely used by masters and PhD students so obviously you'd be affiliated to an institution if you're a computer master's and that's the abbreviation that you would use and it's widely used for that purpose okay so you can use that on your and what about the case for journalism I don't currently know I can't answer that question as to I think it would be worth going on to the ISDA site which is the Irish housing of our data and making an application okay that's great so I'm not sure what the follow-up process is then or whether there's more measures or that can be taken or an agreement that can be met but it's probably worth making an initial application okay brilliant Anderson do you have anything else wide about access for those two situations no I just mentioned each study is does have their own kind of specifications for who can apply for the data use it and how easy that is I will say for everything on the gateway that you could absolutely use it if you were in journalism or in school but in general you don't distribute the data ourselves so you're interested in using the data yourself and not just maybe information about the questionnaire or what you can generate from our graphs and tables then you will have to go to the data providers themselves okay brilliant now I have a question about data management and this was put to Siobhan and the question was do you have any sort of observations about data management and sort of developing data management plans for studies such as Tilda especially the issues around sort of preservation especially for sensitive data yeah so I touched on this in the presentation is that we are very conscious that we don't want our data to become identifiable and we have an ongoing data management process and a growing data management team that oversees all of this so with the information that we released on the website it is a much smaller version of the data set than we house internally which is the privileged access data set it's what we call it and that is only for internal users we do as most people can say is that we for each wave of data that comes in and when we're preparing it for release to the public archive it's gone through question by question every piece of information is reviewed with the view to identifying any possible anonymity issues the other area where there is a risk of identification and particularly in relation to sensitive information is when we when we link our data to external sources such as things like the cancer registry the death registry etc so in any case where we have developed linkages with other sources they're all governed by a very stringent data agreement which enforces all of the checks that we carry out internally until that and you know does everything possible to prevent an anonymization and again the use of these would usually be restricted to a select group of people who've privileged access to the data and are governed by very strict data sharing and data use criteria but going forward you know as these things evolve I don't know what you know I think certainly other studies I'm aware that Elsa in their public data set is why they share everything because they're not governed under their agreements you know they just share suites of data and they don't seem to have that issue and all I can say is that within Tilda we're constantly involving the data management and try to be as strict as possible about sensitive information hopefully that answers the question if there's any other points we want to pick up on just let us know and a next question that we have concerns questions about geography and so perhaps if if Sean Siobhan you could answer that about access to geographic location of respondents in Tilda that's not currently available in the archive data sets okay great and what about oh sorry no I think we just have a general identifier which is urban rural to divide but we don't have the code of data uh-huh great okay and what about full share what kind of geographic information is there well in share we have called the nuts codes which are provided in one of the generated variable modules I don't know if people are know these nuts codes so it's basically a hierarchical system with which structures the country in different subsystems to get graphical subsystems and it actually depends on the national data protection law which nuts level we allowed to to release so yeah it actually depends on the country you're interested in and but in general this geographical information is available in share for the time of sampling so we do not have the constant information on that but at the time of sampling when the household entered share for this time we have a geographical information covered by the so-called nuts codes in the gateway to global aging data what kind of geographic information is available so we don't have any information available it's not provided by the studies themselves so again it just depends on the study okay thank you and I'll go with this question so someone would like to know the harm does harmonize mean that the sort of particularly data sets can be sort of merged together into one file if that was desired I think that is like the ideal goal of course that's not always possible because there are a lot of differences in how the surveys are set up the questions that are asked or just the culture or the or the design of that country in particular maybe their institutions so what we what we try to provide though is a means for getting started for being able to do that comparison and so a lot of that we do through documentation and then also the harmonized data sets and the harmonized curvebook with the party harmonized data sets include less information about exactly how comparable all these measures are and how we attempted to harmonize them I have a question specifically that shared so is the household respondent in share similar to the core respondent in ELSA that is should it be analyzed should analysis be restricted to such respondents well I can't say much about the the concept of core respondent in ELSA but for share we have different types of special respondents the household respondent is only one special respondent we also have so-called financial respondent who answers financial questions on behalf of the couple and additionally we have a family respondent who answers for example questions about a couple's children on behalf of the couple so my imagination is that it might be a little bit different this concept that ELSA has with the core respondent yeah I hope this answers the question Can we speak to that also because I'm familiar with ELSA but ELSA does have particular respondents I know they have a financial respondent and I think a family respondent that answer particular sets of questions I assume Prem maybe you're asking this questioner is asking what the core respondent is any individual who's answering it so no there are particular people who are asked different modules inside of ELSA as there aren't sure sorry do you have anything more to add that then the the respondents they are they are not chosen by us but we ask within the household we ask who is who wants to be the household respondent so who is responsible for the financial situation in the household then this respondent is the financial respondent so we actually ask the respondents if we interview a couple who of both is responsible for the finances of the couple so that's how these respondents are chosen