 So we're going to go straight away to Ali Bloom, who is a research associate with the UK Data Service at the University of Manchester, and her role involves providing support and training to researchers who want to understand access and utilise social science data. So we do have four presentations, which are going to be fairly quick in this session. So I'm going to be strict and keep people to time so that we finish on time. Thanks Ali, over to you. Great, thanks Vanessa. So as Vanessa said, I'm Ali Bloom from the UK Data Service user support and training team. And in this session, I'm going to be giving a quick introduction to some of our learning resources, which can help you explore health data and the other data we have available. Can I just check, are my slides up on the screen? Have we got that? Yes. Brilliant. Just good to check before we get started. So in this talk, I'm going to be covering the UK Data Service Learning Hub. I'm going to be covering our training and events page, which I'm sure some of you might be familiar with as it might be how you booked onto this conference. I'm also going to be talking about places you can find our key health data sets, how you can search for data by a particular type, and then give a quick brief introduction to our YouTube channel as well. So starting with the UK Data Service Learning Hub. The UK Data Service Learning Hub is a central point for access to our learning resources and data skills training. So you can find it on our website by clicking on the tab at the top that says Learning Hub. Once you head here, you'll see different tiles that are targeted to different data skills and training needs. So for example, we have our new to using data tile that's targeted to new users or students or anyone who might be getting started with data. We've got the data skills modules, which I'll come back to in just a second. We also have dedicated tiles for particular data types. So survey data, international data, qualitative data. We've also got a dedicated dedicated section on census data and also geography and data. So that's data that you might want to use for mapping using perhaps the census or longitudinal studies, as it says here. We've also got another set of webpages on software and tools. So this will teach you how to use the software such as our online tools. So things like Nestar, Quali Bank, which allows you to search qualitative data sets by key terms and UKDS.stat, that lets you look at international aggregate data and also our training on traditional ways of analysing data. So things like programming languages such as R and Python and packages such as data and SPSS. We've also got a section on computational social science. So this gives you information on newer methods or new technologies and resources, so things such as modelling or Twitter data or scraping tweets and how you analyse those. We have another section on teaching with data. So, for example, if you're wanting to get started with an open access teaching data set, in terms of health, we've got some Natsal data sets and resources around those that are useful if you're wanting to teach about health as well. And our data skills modules, which I said I'd come back to. So these modules are interactive modules designed to allow you to learn about the key types of data. So we've got one on survey data, one on longitudinal data and one on aggregate data. And we also have our beta module on exploring crime surveys with R, but that's still in development at the minute. If you click on this module, it takes you to where it's hosted in our online platform. You can work through this module at your own time from the beginning to the end or you can choose particular sections that you're interested in. So if I click start course, I'll just give you a bit of a brief idea about what these modules look like. So they're a combination of text, videos, interactive quizzes. We've got quizzes like this where you can check your understanding at the end just to take you through understanding different types of survey data if you're getting started. And if I scroll right down to the bottom, you'll also see there's an assessment and a certificate of completion that you can do at the end as well. So I'm just going to navigate back to talk about the training and events page. So this can be found at the top here. As I said, some of you might have already accessed this and be familiar with this. So from this page, you can search all of our upcoming training and events. You can filter by particular topics. So, for example, if I filter by health, we'll see that the health studies conference today comes up. You can also filter by type, category, and whether training is face to face or online, although most of our training at the minute is still online, and also whether it's upcoming or a past event. And past events are useful to search for because we host all of the past materials and recordings of our events on these past event pages as well. And you can also view by calendar if you're interested in looking for an event that happened on a particular date. If you are looking for data on a particular topic, so, for example, health, if you go to find data at the top, then click on browse and access data. You can browse data by theme here. If we click on health, this will take you to our health theme page, which brings together some of our popular key data sets on the theme of health. So, for example, we've got the adult dental health survey, ELSA, Natsal, which I mentioned earlier, understanding society, and links to some of the census data as well, which could help you if you're trying to visualise some of the health data. If you want to view all of the data for health, if you click on view all data up here, it'll take you to our data catalogue where you can search for data. If you want more information on how to search the catalogue in detail, the pinned page on our YouTube channel, the pinned video on our YouTube channel goes through that, and I'll demonstrate that in a second. So, watch that if you want a bit more information on how to search the catalogue. You can also search by topic, so if you click on health, you'll be able to see health data that's specific to that. You can also search by data type, so if you're interested in longitudinal data to look at how health changes over time, you can click on this and it'll filter by that as well. Another way to search by data type, which I'll just quickly show you before I move on to the YouTube channel, is again to go to find data and browse data, and if you scroll down and click on, for example, longitudinal studies, if, like I said, you were interested in looking at health over time, it'll take you to the catalogue with the filter for longitudinal studies selected. Ali, you've just got one minute left. Brill, thank you, Vanessa. The final thing that I want to highlight is our YouTube channel. So here you can find video tutorials that highlight topics such as how to use our online tools, like the ones I mentioned earlier, NESTAR, or information on how to download things like boundary data. We also have training playlists, so these are themed by topics. So, for example, if you wanted all the information on accessing data, or citing data, or computational social science, and we also have all of our past events playlist as well. So if you're looking for a recording event like this and you want to watch it back, that can be found on our YouTube channel as well. That concludes the presentation. Brilliant, thanks, Ali. I'm not seeing any questions in the Q&A box, so I think just for the sake of time, we'll move on to our next speaker and we can deal with questions as we go along then. Great, thanks, Vanessa. OK, Bethan, would you like to put your slides up and I'll introduce you? So Bethan is a principal information asset owner at NHS Digital. She's been at NHS Digital for over 10 years and she's senior member of the Data Access Request Service. Bethan leads the team that focuses on commissioning applications and is also the IAO. I'm not sure what that stands for, for a number of assets, including the NHS Digital Survey data sets. So over to you, Bethan. Thank you, Vanessa. Can you see my slides? OK. Yes, perfect. OK, so an IAO is an information asset owner. So it's a role that was created after GDPR to make sure that within public sector organisations we are utilising assets to the best of their capacity for doing it legally and appropriately. So I've got 20 slides today. Don't worry, I'm not going to talk through all of them. The last four or five slides are just some extra information. Should you want to find out anything further about NHS Digital and DARS? So let's see if I can get that to move on. OK, so who are we with NHS Digital? We are the national information technology partner to the health and care system. And our mission is to harness the power of information technology to make health and care better. Currently, we are a standalone organisation. By this time next year, we will have merged into NHS England. So what's DARS? DARS is the data access request service. And this is the team, the people who facilitate access to health and social care data for organisations such as clinical research bodies, academia, commissioners, the CCGs and occasionally commercial companies. The DARS team role is there to ensure that access to the personal data is done in a legal way, that it's done within the IG requirements, that it's being held securely, that it's being used to improve health and care services. And paramount is it's not used solely for commercial purposes. And some key information about DARS. We process more than a thousand applications by NHS Digital data each year. On the other hand, half of those are from researchers. We have 70 data sets available to request from DARS. And the majority of them cover England, but we do have a few that cover the devolved nations. Many assets can be linked to other data sets or to cohort data that researchers provide. And we also now offer a clinical trial service, which can help identify an appropriate cohort of patients for a planned clinical trial, can provide contact details for contact candidates for trial and provide updates on the chosen cohort of patients. As you can imagine, that service has been incredibly busy over the last two years with the pandemic. So, specifically, survey data sets, we have our normal standard data sets like hairs, mental health, eye apps and things like that, which take data from the hospitals and clinicians. And then we have the surveys, which are taken from people's respondents homes, interviews, postal, online surveys. They're much smaller, but the depth of questions can often be quite much broader. And our survey is looking to a range of issues around health, lifestyle, mental health, behaviors and choices. And I'm not going to go into much detail on the surveys because I imagine most of you are aware of a lot of them. But if you want to find out any more about them, there's a link on the slides to our website. So, and this is just a quick list of all the slides that we have, sorry, all the survey data sets that we have available via the UK data service currently. There's quite a range, as you can see, but there's a huge amount of data out there all available. And later on in the slides, I'll give a quick update on what we still have outstanding and what we're working on currently. So, as Jenny was said, yes, she's correct, there are two ways of accessing the data for most of the data assets through the UK data service. There's the end user licence, which means you don't need to go for more approvals, or there's a special user licence. For some of the assets that can just be done via the UK data service and the IAO, it means it comes to me, the UK data service said it to me to approve it. Or some of them, so for APMS and the Mental Health of Children and Younger people, that needs to go through a Darls application and a data sharing agreement. There's also the opportunity to request bespoke versions of data, but they can take a little bit more time as they need to go through all the necessary approvals and we need to get the survey organisations involved for producing those. And this is just a quick snapshot of what we have available currently by the UK DS. So, how do you access the assets? Like we've said, for the end user licence, go up to the UK DS, apply that way, it's the way to get in, and that's the way if you want the special user licence for HSE as well. If you want the special user licence for mental health, children and young people, and APMS, that's going through a Darls application. And these are the steps here of how you go through to do it. The first, the best thing I can suggest is if you want to go through that process, get in touch with the enquiries mailbox and one of the two will be able to talk you through it and help you and support you through it. And if you're wanting a bespoke request, then please get in touch with the surveys team directly. And what's our progress today? Well, we've done loads of work over the last few years getting the data sets live, but we are a bit delayed at the moment on some of them and getting them out in the UK data service. And that's predominantly due to the COVID priorities and the COVID impact on the teams. So, the mental health of children and younger people, they've got three follow-up surveys of the 2017 cohort they take in place and the data sets will be available soon. Smoking and drinking drugs in younger people, the 2021 data set will be available later this year. HSC work is ongoing for the HSC 16 and 17 and we're also working on ethnicity data sets. And there's plans to have a secondary but slightly reduced data set for the APMS, which I assume will be making available under end-user licence. And how are we trying to make things better for you as researchers and as customers for the service? What we've got is we have a precedent approved for DARS. That means that for certain types of applications, we don't have to take it through the full DARS process before you pre-got some of the approvals in place. Obviously, subject to you having the necessary legal basis in place as well. There are some limitations around that. It can only be for not-for-profit research or education. Can't be identifiable data. And if it's for a commercial request, even if everything else fits within our precedent, then it would still go through the standard eye-guard approval. And for anyone who's unaware, eye-guardies are independent assurance review group who are a collection of lay people and specialists who are independence managers digital and review our first-or-types and more complex or complicated data sharing agreement requests so we can get an independent viewpoint on what they think about the application and is it fit for purpose and is it suitable. So we're also working with them to look at what we can do about improving the approval pathways for the HSE Blood Bank, the data linkage to non-NHS digital data sets, the bespoke data requests and to recontact cohorts for follow-on research. And that's all I was going to say, Vanessa. That's helpful. Thank you. Okay, so there aren't any further questions. So we'll thank Ethan and we'll move on to our third speaker, who is Neil Kay. Ethan, do you want to stop sharing and then Neil can share his slides? Hi, Neil. Hi. I'm going to share your slides and I'll introduce you. So Neil Kay is a research fellow closer, which aims to increase the visibility use and impact of longitudinal population studies, data and research. He leads closest training and capacity-building activities overseeing the development of online learning resources aimed at students, researchers and policymakers. So you have 10 minutes, including questions, Neil, so fire away. Okay, thanks, Vanessa. Yes. So today I'm going to talk about Closest Learning Hub and introduce how it can be used in your research. I'm going to quickly talk a little bit about Closest itself, what we do, followed by a very quick introduction to our learning hub. Show you what the Closest Learning Hub looks like and then towards the end of the presentation show you how to get started effectively and give you a flavour of some of the very helpful animations that we've had commissioned, which will hopefully help you to navigate around the learning hub and to get what you want to get out of it. So first of all, Closest, who are we? Most of you, hopefully, will have heard of us at least and we are an interdisciplinary partnership. We have 19 of the leading social and biomedical longitudinal population studies as partner studies, as well as the UK Data Service in the British Library. And we have this mission to increase the visibility, use and impact of longitudinal population, the studies, the data that comes out of the studies and really promoting research using that data. We have several core areas of work and many of you will know about several different parts of that, not least the Closest Discovery search engine, other projects and partnerships. We have policy and public affairs and dialogue and other projects in terms of data linkage and data harmonisation. But the Closest Learning Hub that I'm going to talk about today is really our flagship product coming off of our training and capacity building strand. So the Closest Learning Hub is very much aimed at newcomers to longitudinal research. It's very beginner friendly. It's really aimed at students or academics and analysts who are not necessarily familiar with longitudinal population studies. And so it helps you to kind of navigate what they are, how to explore them and put your research skills into practice. And the way it does this is it kind of sets out the information in a step by step process mirroring the process of answering a research question. So, for example, you can use the Learning Hub to think about how you can use data from these studies in your research, what kinds of questions you can answer with this type of data and as well where you can access the data, how you can analyse the data and the different ways that your research can really progress throughout that process. So, if you're unfamiliar, what does the Learning Hub look like? Well, this is the kind of homepage. It gives you an overview of the longitudinal population studies. And from here, you can access the different areas of the site on the top bar. It's divided into four main areas. The Learning Modules is where the bulk of the information on different themes is located and that will drop down and you'll be able to navigate through that heading. Teaching resources, if you're not a student, but instead a tutor and you're trying to impart this information to your students, which I can recommend to everyone to do, some more kind of teacher-friendly resources and formats for the information are provided through that section. In addition to, we have access to a couple of training data sets as well there. And then we have research case studies which give a real-world application of how some of this research, using longitudinal population studies data, has been done and explored by topic section and the glossary, which I think is hugely beneficial, especially for people coming without a familiarity of a lot of the terminology. So, if you get access to these slides, this link will take you to the full introductory getting started video. But what I'm going to try and do now is show you the learning hub. Hopefully you can see that. And you'll see we've got this additional getting started tab at the top, which will take you to, yes, the videos, the getting started animations, which are helpfully divided into six parts which cover a whole range of different areas of the learning hub site. And even more ambitiously, I'm going to try and show you just a couple of these animations. They're not very long, but just to give you a flavour. The learning hub is divided into four main areas. Learning modules, teaching resources, research case studies, and an explore by topic section. There is also an extensive glossary available in the top right-hand corner to provide a more detailed explanation of some of the more complex terms. The learning modules contain information, videos and interactive quizzes in six thematic subsections. The first two of these are useful in providing an overview of what longitudinal research is, how longitudinal data are collected, and why longitudinal data are particularly valuable for answering important research questions. Longitudinal studies share a common aim to examine change over time and to capture events in people's lives as they age. To this end, a longitudinal study is a prospective observational study that follows the same people over a period of time, repeatedly collecting information from them. They differ from cross-sectional studies, which interview a fresh sample of people each time they are carried out. Many longitudinal studies collect a broad range of information about different areas of their participants' lives. This makes them incredibly valuable when looking at the way different aspects of our lives interact and how early life circumstances or characteristics relate to outcomes in adulthood, middle age or later life. To learn more about using data from longitudinal studies in your research, visit the Closer Learning Hub. As you begin to consider your research topic, you may already know what research question you would like to answer, or you may have a broad area of interest that you would like to investigate further. In any case, you may be in search of a little inspiration to set you on the way. Longitudinal studies provide a rich source of social science and biomedical data and can be used to answer a whole range of research questions. The Learning Hub provides research case studies, each of which detail a piece of published academic research using longitudinal study data. The case studies currently available on the Learning Hub cover topic areas including social media and well-being, ethnic differences in unemployment, the rise of the obesity epidemic and childhood bullying. The case studies provide examples of real-world research, outlining the research questions asked, the study and data used, what the key findings of the analysis were and what implications these might have for policy or further research. The case studies also discussed the advantages of using longitudinal data to explore the topic in question. Closer aims to regularly add new research case studies to the Learning Hub to provide an even broader range of examples to demonstrate how longitudinal research has been applied in real-world settings. To learn more about using data from longitudinal studies in your research, visit the Closer Learning Hub. Trif, that's all I had to say and any questions about the Learning Hub or about Closer more generally, I'm happy to take. Thank you, Neil, and we'll move on to our next and final speaker of the day, who is Nazir Rajar. Nazir, do you want to put up your slides and I will introduce you? Nazir is a research fellow at UCL's Centre for Longitudinal Studies. His recent work has focused on administrative-linked cohort data and his broader research interests lie in the economic effects of ill health. So you have slightly longer, actually, you have 10 minutes plus the questions and answers. So I'll pop up when you're about two minutes towards the end of your presentation and take it away. Yes, my name is Nazir. I work with Dr Richard Silverwood. I think some of you might have met earlier. In this presentation, I'm going to be going through the work we've been doing on the linked hospital website with statistics data and how we can use that to aid and the handling of missing data. I've got 10 minutes, like you said, so I'll try to be speedy. Some of the co-investigators on the grant as well, Lisa, George from CLS and Bianca and Katie from UCL. So just a brief outline of what I'm going to go through today. A bit of a background then to go over some of the data, the NCDS data, the recent HES linkage, and then on to the work we've been doing to identify predictors of non-response using the HES data and then restoring the NCDS sample representatives, which are some preliminary results and then some conclusions. Bit of background. So non-response is quite common in longitudinal surveys. I think one of the biggest issues in regard to non-response is the introduction of bias because people who respond tend to be fundamentally different from people who don't respond. So in some of the work we've seen in other longitudinal data, people who are from an ethnic minority background are less likely to respond as time goes on. It's the same with education as well, so people who are more highly educated are more likely to respond. So this is the analytical strategy. There's a growing interest in whether linked admin data has the potential to aid analysis subject to misindagent cohort studies. So we want to identify the predictors of cohort non-response in linked administrative data and explore whether it adds any value to those already, to the identify variables, adds value including identify variables as auxiliary variables with respect to the restoring sample representativeness. So today we are going to focus on the National Child Development Study and the Health Episode Statistics from NHS Digital. So these are the two data sets. I'm sure some of you might be familiar with this data set if you know it's actually a really cool data set. There's a long-tune of birth cohort study that tracks all babies born in a single week in Great Britain in 1958, initially just under 17,500 and it was later augmented by immigrants born in the same target week. It spans a wider range of topics, economic circumstances, social participation, family life, health, et cetera. Onto the Health Hospital Episode Statistics data, it's a collection of database containing interactions with the NHS hospitals in England only. And it's broken down to four databases. It's the admitted patient care, critical care, accident emergency and outpatient appointments. And unfortunately, I guess there aren't too many people in the critical care so we don't actually use that data or any of the variables that come from that. So the data includes the dates, the diagnoses, procedures, patient demographics, hospital characteristics for each hospital episodes. There can often be multiple episodes per admission and the availability of the data differs slightly. The APC comes from 97. The outpatient is from 2003, the ANE from 2007 and the critical care as I mentioned we didn't use but it's from 2009. The linkage between the NCDS and the datasets in the HES was undertaken on the basis of consent at age 50, which was sweep eight. This is just a flowchart to give an overview of how we went around our analytical strategy. I just go very quickly at the top headline figures. There's just over 18,500 people. We then limited to those people who lived in England in wave six and wave nine. In the wave nine target population, those people are still alive and living in the UK. And then the linkage consent is at wave eight who have pre-2013 HES data. It's important to use pre-2013 HES data because that is when the NCDS survey variables are picked up. So it's around 2013, 2012, that's 2012. And onto the HES predictors of non-response. So we initially derived the variables from the HES APC, the outpatient and ANE. And then we derived 58 variables relating to a number of admissions and appointments, missed appointments, the investigations undertaken, the diagnosis, which includes ICD chapters. That's quite a formalised process, the treatment received. We assume that if the cohort member was eligible for linkage and consent to the linkage, but didn't have any link data, that they truly didn't have a relevant interaction with the NHS. So for example, in the APC, if they didn't receive a diagnosis of something, we would just code that as not having received the diagnosis rather than it being missing. So the strategies that we, the technique that we use was leased absolute shrinkage and selection operator, a bit of a mouthful, is shortened to lasso on the identified HES variables. So we start with 58. The lasso removes the variables that are not influential in predicting non-response at age 55. We use a penalty to lambda value that's determined by cross-validation using 10-folds. So that is essentially where we split the data into random 10 chunks and run different lambda values. And it selects the lambda value that gives the minimum mean cross-validated area. In this instance, it's the minimum misclassification error. And after the lasso is completed, we have these variables that were selected, to 10 variables that were selected, a number of any appointments, treatment for adult mental illness, proportion of points missed in outpatients. And there are five ICD chapters. So one of those, for example, is ICD chapter four, endocrine nutritional and metabolic diseases, two operation codes in APC. So one of those is operation code T, soft tissue, and restoring NCDS sample representiness. And so we took this into two approaches first. We want to see how well does, how well does the HES variables do in predicting, in restoring sample representiness. We can only do that in amongst HES linkage consensus. And so this wouldn't be typically what you do in Europe in your own research, but if we wanted to know how well does it work, because amongst the whole cohort, there are people who have not contented to health linkage, so they don't have any information on HES. So it wouldn't be right to use that same approach in assessing the quality ability of HES variables to restore sample representiness. So I've broken that down into two. So on this slide here, this is amongst the HES contenters only. So there's quite a lot to unpack here. On the Y axis, we have the estimate. So that is the cognitive ability at age seven for the NCDS individuals. And then we have the analytical steps that we've taken here. So we begin with the amongst all respondents at age seven, which would be about 14,400 people. And then amongst the target population at age 55, which is just under 13,000. This is the bias introduced by being a HES consenter, by consenting to HES. And that's the target sample in this analysis. This is what we want to restore back to. This would be the reference point. And this is the bias introduced by being a consenter and also a wave nine respondent. These are the three approaches to which multiplication, which would be run 20 times. And we'll run a little more later on. This is the extent to which it can restore sample representiness. You can see using the HES only variables in multiplication doesn't add that much. Relative to the survey variables, which do quite well in restoring sample representiness. You can see that it's close to the reference value there. And seven has variables. Again, that's difficult to say, because the survey variables have done quite well. And then this is amongst the whole target population. And perhaps what you might do in your own research here is not possible to assess the ability of HES only predictors to restore sample representiness. So here, this is amongst all respondents, again, 14,407 amongst the target population. If you were to say only wave nine respondents, this would be there, the bias introduced by that. And you can see using the previously identified survey variables, it restores sample representiness. It's not possible to determine how well HES variables do in this regard because the survey variables have already done quite well in restoring sample representiness. So conclusions. So we identified HES variables, which are predictive of non-response at wave nine when cohort members are 55 years old. We've incorporated these variables as auxiliary variables in the multi-imitation analysis. It's had relatively limited impact in restoring sample representiness. I mean, we didn't really find an additional gain relative to using on the survey predictors that CLS already has. Whilst this finding may not extend to other analysis or NCDS sweeps, it does highlight how well the utility of survey variables in handling non-response, and that's quite useful because one that's much easier to get than going through the process with NHS digital of obtaining licences, et cetera, and going through the training, it's much easier to get the survey variables and it's easily implemented in standard software. So in this analysis, we used R. And some references, and this work was funded by the SLC in the CLS in the SLC grant. Thank you very much for listening. Went through that quite quickly. So if you have any questions, please just let me know.