 So today we're going to be introducing you to the Smart Energy Research Lab. This is a joint webinar this afternoon, so I'm Deb Walsh here. I'm based at the UK Data Service and I'm part of the team who make a whole host of different data available for researchers. And very, very happy to have Dr Ellen Webong with me this afternoon from the Smart Energy Research Lab team and she's going to be introducing the lab and the data. I'm going to start by introducing the presenters today. So as I mentioned, my name's Deb Walsh here. I'm based at the UK Data Service and myself and my colleagues are responsible for making a whole host of different data sets available for researchers and it's our job to guide you as researchers through the process of applying for and accessing these data and we're there also to support you throughout the life cycle of your project. Joining me as I say is Ellen. Now Ellen is a research associate in the Smart Energy Research Lab team and they are based at the UCL Energy Institute and her role includes a preparation and data quality analysis of all of the cell data sets that are available. So we're going to cover a lot of information this afternoon. We are recording the webinar and we will make that available afterwards. So if there's anything you miss and you want to revisit then we'll get that up on the website as soon as we can over the next week. Likewise we'll make the slides available so don't worry about trying to take copious notes. We'll make all the information available to you afterwards. So the webinar's really split into two sections today. So Ellen's going to start by introducing the Smart Energy Research Lab and she'll talk to you about the data that's available and then she'll hand back over to me and I'll talk about the application process itself. So hopefully it's going to be a really interesting informative afternoon. The goal is we'll talk probably for about 40 minutes and then as I mentioned right at the start we'll have some time for questions at the end. So without further ado I'm going to stop sharing my screen. I'm going to hand it over to Ellen who's going to introduce the project. Okay can you see my slides? Yeah perfectly thank you. Great okay. So I think Deborah's done a great job of introducing the agenda so we'll skip through that. We'll start with the project overview. So CELL is a six million pound five-year EPSLC project and we've got another 18 months or so left. UCL is leading a consortium of eight partners up with their logos there at the bottom and their aim is to provide an energy data resource for the UK research community. So that means providing high quality smart meter and link contextual data for innovative research in the public interest. So we are recruiting and collecting the data for around eight to ten thousand households in GB and we aim to be representative of the SMETs two populations so people with second generation smart meters. So you're probably aware high resolution smart meter data could be a real game changer for research but there are substantial barriers at present to getting access for researchers. So you've got the technical barriers of actually accessing that data, the legal barriers around the sensitivity for data and then the financial costs of getting access. So to counter this CELL is providing a central resource for the UK research community. So we're already funded, you don't have to pay for individual access to the data. We're providing a secure lab environment so smart meter data is personal data so we've got this virtual environment you can work in. Our UK data service team has overcome the technical barriers to actually accessing the data from the individual meters. We provide data linking at the household level and what I'll mainly talk today is about the observatory data set. So this is the data set of the eight to ten thousand households and that is the study on its own or to act as a control group for your projects. And we're also going to be providing a laboratory function where researchers can recruit their own participants and we'll be able to access their smart meter data for you if you have consent. So so far we've recruited around half of our participants and we're currently in our final wave of recruitment so letters went out last week and we're aiming to reach hopefully 10,000 in the next couple of months. We are open for business so we've got a UK DS study number 8666 which you can search on the UK DS website, check out all our documentation and some code and research has already begun so we have researchers using the data now in the secure environment. So to give you an overview of the different data sets we're using so there's electricity data and gas data from the smart meters, we've also got weather data, a survey and energy performance certificates and this combines into what we're calling the CERN observatory data set. So the electricity data is daily and half hourly readings in theory all participants should have this. We also have export data if it's available so if someone has solar panels we get the net export data and that's active and reactive power. When people sign up to CERN they can they give us consent to access their data and we can go back up to 12 months depending on when they moved into the house and when they got a smart meter. The gas data is quite similar so again daily and half hourly readings but not so many people have a gas mains meter so that's 70 percent of our solar power participants. The weather data is from the ECNWF and it's ERO5 re-analysis data so modeled based on readings. It's publicly available an hour reading resolution 30 kilometers spatially. Initially we just were providing surface temperature but in the next data release we'll be providing up to about 20 more variables and that's a little bit behind the other data sets because it's really supporting. Now when people sign up to CERN we ask them to optionally fill in a survey so that's about 40 questions about the dwelling your participants in their attitudes and behaviors and pretty much everybody at least starts the survey and most people complete it and that's just one on connection. And finally EPC which you might be familiar with about half of bones in the UK have an EPC so we source that externally and it's publicly available but we link all of these data sets together. So now we're going to a little bit more detail about the data you can expect. So we collect the smart meter data via something called the DCC gateway so that's a messaging service and that sends us the data directly from each household meter. We get the electricity and the available gas data in half hour in daily readings. We also get inventory data so basic information about the meter although we're not currently making that available for researchers in UKDS but something potentially in the future. So our team at UKDS collect readings every single day and we then make them available on UKDS approximately quarterly in terms of the files you can expect. So those daily and half hourly files so these are reads for each participant for each day and reach half hour with the available energy data. We also create a retype summary table so for each type of reading for each participant there's the amount of data available with each type of error flag that we've created and some basic statistics at the minimum and max. We're also creating a participant summary table so that's a high-label data quality summary for each participant so that includes non-smart meter data and basic info. And I should say that the energy data includes both the raw data and these error flags we've created and some basic error correction. In terms of the different read types available so this table shows you all the different retypes for the daily and the half hour data for some example values and units and you can find this table and a lot more information in the documentation you've created which is on the UKDS website with the study. I'll just say a few things about the sensitivity of smart meter data so consumers own their own smart meter data. It's personal data so we write to participants and get their explicit consent for collecting their half hourly and daily smart meter data and that's going forward until they withdraw consent or move out and we don't have a stated end date for that and then as I said before historically back up to a year if possible. So most of our data goes back to around 2019 a few of the earliest readings from August 2018 and as we'll look back later all projects need to have ethics applicable from the University and be approved by the Cell Data Governance Board. So the Cell Survey is answered online or on paper. It's about 40 questions and it's optional. It's mostly multiple choice and then there are some derived variables we've added in some so things like the number of adults and the number of people in each age category and we've also included some error flags and some basic data cleaning. Here's this gives you an example of the types of questions we've asked. A copy of the survey can also be found on the UKDS website to study. So things about energy and heating, including the heating practices, time changes to energy efficiency, information about the accommodation, the number of rooms and the input was built and things about the household and specifically the household at the end. So how they're managed financially if they've got an electric vehicle and their working status and then the energy performance certificate data is the publicly available dataset. It's about 80 variables and about a half of our participants have that data known. So it's a mixture of practical data for things like the energy rate you need to do and then numerical data in the total pool area in the square. The weather data, as I mentioned before, it's publicly available and it gets updated every three months. So it's a re-analysis model of climate data. It's an even space location so 30 kilometers apart and in the data sets each participant has a grid cell variable to link with their nearest data point. So we can do that linking and we're moving from one variable to around 20 extra variables in the next data release, which will be in a couple of weeks time. So a few things about the data governance board I mentioned earlier. So UCL is the data controller for cell data. The data government board of the DGB is formed of independent experts, so from industry, government, academic and consumer interest groups. So it's independent from the team at UCL and from cell. So they acted with data owner to review and approve data access requests and UCL acts as the DGB secretariat and we have a technical advisory role. So in order to access the data it will require accredited research status. It's only available to UK University employees, legal reasons and access is always within the secure virtual lab environment. To give you an idea of some of the projects we're planning or we started, we've got one project on the COVID-19 impact on energy consumption to see how lockdowns have impacted how people are consuming energy. We've got a project about smart EPCs, so how smartly today to can enhance energy performance certificates or provide a different alternative in use energy performance certificate. And then we'll also be producing an annual report to report on some of our smaller projects and general findings from the data. We're going to be linking cell with the English housing survey, hopefully recruiting some of their participants to match up their EHS data with our cell data. Some of our consortium partners have got projects as well. So for example Leeds Beckett have a project characterising building thermal response to understand the time spent in thermal discomfort as people wait for their homes to heat up in winter. And it's out hunting and researching habitual energy consumption over periods of weeks and months in order to understand the potential for peak demand shifting. So that gets you a flavor of some of the projects we've envisioned and I'm sure that'll be far more that you might be interested in doing. So if you like more detail, we've got our website and email address for inquiries. If you'd like to subscribe to our newsletter to get updates about the project and of course you can check out the UKDS website with that study number there. So far we just have the conference paper about cell but we'll be producing a few more publications this year about how the data is collected and the data descriptor. Hopefully Ellen's given you a flavor of what's available and all the sort of things that you know potentially you could do with the data. Now I know some of you who are here this afternoon have already gone through the application process and have projects up and running already. So it's really good to see you all here this afternoon but some of them some of you won't have gone through that process so this will potentially be all new to you. So I thought it would be useful just to talk through the application process. So what you need to do if you want to work with these data. And before I get started by talking about the practicalities of applying for these data I want to just situate it in the context of data access policy because I think it's important to have a little bit of an understanding of this before you access these data. So essentially data access exists on a spectrum and you've got open data where you know the data is not that detailed might be aggregate data and there is no considered risk that any participants in that data could be re-identified and generally there won't be any restrictions on the reuse of these data. Then you have this middle section of the spectrum which is safeguarded data and this is where there might be potentially a risk of re-identification but it is very very low or even zero. And this is data that's available under our end user license or perhaps our special license. So there's a little bit of paperwork to do so you need to have authentication and authorization and you might already be very familiar with that process if you've worked with data through the UK data service before. But the third group which is what we're going to talk about for the rest of this webinar is the control data access. Now this is where the data is very much more detailed there is a risk of re-identification within it so we require again the authentication and the authorization but there are additional steps to go through and that requires approval of project, the vetting and training of researchers for example. And I'm going to talk about that a little bit more detail but I just wanted to set that context before we get going. So accessing control data involves a secure access agreement. Now I'm not going to talk about the legislation in detail because I'm not a lawyer at all and I don't think you need to know the legislation in huge detail but I think it's important to realize that these control data, the secure data are only made available for access through specific legislative acts and this legislation allows us to provide access to personally identified data, identifiable data under a legal gateway. Now those legal gateways vary, there will be different legislative acts depending on the the data source but essentially they all do the same thing. They determine who can access what data for what purpose, under what conditions and for how long. And I think that's really the take home message here that I want you to have. So under these legal gateways researchers access data and they undertake their analyses in a safe setting and sometimes you might hear those called safe havens, secure labs all, there's different terms but essentially it's a secure setting. And researchers agree to conditions for handling personal data, they agree to reach penalties, they agree to be trained and to become an accredited researcher and I'm going to explain that in more detail and they agree that their research projects will be accredited as well. And there is the need for the institution to come to sign on behalf of the researcher as well. So that's kind of a really really top level explanation just to set the scene and I'm going to move on and I'm going to talk about the application process in detail now. So the application process is a multi-stage process. You need to be allowing time for this process so you know it's not an overnight job. You need to really allow a couple of months. Researchers have got to be based in the UK whilst they're accessing these data so just bear in mind that if you are based in multiple locations, access to secure data has to be done whilst you're based in the UK. You need to apply to become an accredited researcher which means you have to meet the data owner's criteria and you have to attend a short training course and again I'll explain a little more about that in a second. And the next stage is to submit a research proposal and your research has got to have a valid statistical purpose and it has to be feasible and again I'm going to talk about that a little bit more. But first I want to explain what we mean by accredited researcher. Now this isn't actually a very complex process. You need to submit an application form and you will then need to complete the safe researcher training course so it's not a particularly complicated process but you do have to spend a little bit of time completing that form fully. That's probably my number one tip and you will need to meet the accreditation criteria. Now just to give you a little background on the accredited researcher status. Now the Office for National Statistics, the ONS have been given the authority to manage this accreditation process. Now this is something that you only need to do once every five years so it's not something that you have to do every project or every year. It's just do it, your status lasts for five years, at the end of that five years you'll need to refresh that status. The good thing about having the AR status is that you can use it across all accredited digital economy app processors. Now we at the UK Data Service are one such processor. There are others for example the ONS and the HMRC so if six months a year down the line you decide you want to go off and do some research with the HMRC you don't have to reapply for the AR status. Just one other little thing to note as an AR you will need to agree to your name being added to the UK Statistics Authority website. They are keeping a list of all AR researchers. Now I mentioned that you have to meet the AR criteria and this is really what it is. The first two bits are ensuring that you have the expertise and the experience to actually do the analysis and work with these data. They ask for information about whether you have an undergraduate degree or higher which includes a significant proportion of maths or statistics or alternatively you can demonstrate at least three years quantitative research experience. Then the other parts of the criteria is that you have to complete the safe research training course, agree as I say to your inclusion on the list of AR researchers. There is the criteria that you need to agree to publish the results completed through this scheme and you have to sign and adhere to a formal accredited research declaration. So that's kind of it. Now the safe researcher training is designed around the basis that using sensitivity of control data is pretty much all about common sense so if you've got a bit of common sense you're going to be absolutely fine. However there are some bits of specific knowledge that you will need around disclosure risk and how to mitigate it and a lot of researchers in their safe have this experience of working with control data will not have that specific knowledge so that's why we train you. It's a short course at the moment thanks to COVID it's online and it lasts about three three and a half hours and that kind of depends on the group really if we we have a larger group and we get a lot more questions it will last a little bit longer. The course itself will introduce you to the wider context so looking at understanding data access looking at the five safes framework which you may or may not be familiar with how things might go wrong with data access and we talk about this concept of safe people and then we'll introduce the technical knowledge and this is all around statistical disclosure control which is the process of ensuring that survey participants are not identifiable in the publication of research outputs attendees need to take and pass an online test afterwards now the good news is we have an extremely high pass rate so it's not anything to really get too concerned about but you do need to pass this test okay so that's all about you as a researcher and then the second stage of the process is to have your research project accredited now as Ellen mentioned all project applications have to be approved by the data governance board and I'm going to introduce you to the board and what they do in just a second I'm going to cover some top tips about completing project applications but I will say this off the bat they have to be thoroughly completed and I'm going to dig into that a little bit more in just a few minutes in particular you will need to play close attention to how you're meeting the public good and also providing evidence of ethical approval now you should all be doing this at your institutional level anyway so this should be familiar territory but I am going to just cover ethics a little bit more right at the end now sir has its own data governance board and they are a group of highly experienced researchers and data professionals with a vast amount of expertise and it is their job to review and approve applications and their aim is to do that through transparency and fairness now data governance boards sometimes they're also referred to as data access committees but they tend to follow a very similar design so they'll have a panel membership that will have experienced researchers stakeholders they'll have a secretary out some will have lay members some won't but they will all meet periodically and it's most commonly monthly now some of the data access committees that I sit on have been going for many years and some of them will have a system of precedence for common project types so some will have a system where projects may not need to go to a full board review but that tends to be only committees and boards that have been going for a considerable period of time and their job essentially is to consider this is what you're proposing an appropriate use of the data is it legal is it ethical is it feasible and that's kind of it now you will have to provide a set of materials for the panel which as a minimum will consist of your project application and your ethics assessments it may be that there are other supported materials that are appropriate but that's not in every case the outcomes that are possible from a review by the data governance board is full approval it might be that they give you a conditional approval but they will come back and ask for a little bit more information or clarification or some amendments they do also have the option to reject applications that's not a very common option the majority will go through either with full approval or conditional approval now the UK data service we have a role in this process and our role is to triage all of your applications before they go to the data governance board and our role our aim is to make sure that every single application that goes to the board is of the highest possible quality so that they all get through with full approval at first review that's kind of our aim so when you're applying you will deal with with us at the UK DS and we will guide you through the application process will give you any guidance on whether you can improve your application whether there's anything missing so you know we'll work with you throughout that process so my top tips for a successful project application which is approved at first review and I alluded to this earlier but my number one tip above all else is detail detail detail I see a lot of applications where researchers miss sections of the form and unfortunately we can't submit incomplete project applications to the board they would be rather annoyed with us if we did that it's important to realize that the board will very very closely scrutinize what you plan to do because they have to make an informed decision so make sure everything is completed if there's something that's not applicable to you make a note of that that's absolutely fine but don't go leaving chunks of the form blank the second and third tip really go hand in hand and this again is about detail and about clarity please make sure that you have outlined a clear research proposal whether that's having a set of specific research questions or a very clearly defined aim the more detail the better is is the is the watch word here the same goes for the methodology you will be asked to to outline your methodology now you may not have worked out all the details but you should at least have a plan of how you're going to get started and both of these need to apply the first tip detail detail detail the next tip is to make sure you've included details of all of the data you need a common error I think is for somebody just to put something very broad like cell data you need to be more specific so if it's data from the UK DS catalog put the study number so that will be SN and four digits the clearer you are here the easier it is for us to process your application let's talk about public good now this is absolutely fundamental and it's something that the the board will pay close attention to my top tip here is don't over promise we are not expecting you to solve the world's problems so don't promise that you're going to do that it's more likely that what you're going to be doing is adding to the existing evidence base or extending our understanding of a particular issue and they are absolutely perfectly valid you know we've we see the odd application form where somebody might put I'm going to solve this issue and and the board tend to be a little bit unsure about whether that's really feasible for a research project so be realistic and don't over promise the other thing is allow enough time and I mean this in two ways the first is don't try and throw out the application form in five minutes again you know don't spend day in day out but make sure you spend a little bit of time doing it well the first time the other thing is you are asked to say when you think the project is going to finish and it's a common um it's a common habit to be a little bit over optimistic about how long you think your project will take always allow a little bit longer than you think because life happens and you know deadlines get pushed back my final top tip again is take time over your ethics form and I'm going to talk about ethics now and ethics forms can be surprisingly tricky I think now you will be used to doing institutional ethics forms and that is part of the application process so we need to see evidence that you have gained institutional approval we also need to see a UK statistics authority ethics self-assessment form now the UK stats authority are really pushing forward with their ethics agenda now and they've developed a fantastic self-assessment tool and their hope is that they have designed an easy to use framework which enables you as researchers to review the ethics of your project for yourself there are six main principles and we can see public good training legal gateways all of the sort of things we've talked about this afternoon they're all part of this ethics assessment each of these six principles are broken down into a number of items and there are 22 in total if you haven't had to complete one of these before it is essentially an excel spreadsheet which looks a little bit like this screenshot you can see on the screen and essentially what you have to do is for each of the 22 items you have to give your project a score and then you have to provide a justification for why you've given it that score and the spreadsheet will automatically calculate an overall score and the data governance board will pay close attention to this I have to say that some of the items are more self-explanatory than others others are a little less clear I think and they tend to catch researchers on the back foot so the first one we can see on the screen is public benefit and I think most people will find that fairly self-explanatory but some of the others like potential harm can catch people off guard so my top tips here is read the guidance given by the UK stats authority and I've put the link to that at the bottom of this slide so when you make when you access the slides after the event you can just click on that link and go to the guidance if you are unfamiliar with this framework I would recommend you have a scan through the guidance first the reason being is it will explain each of the items and it will also give you suggestions of the sort of things that you can write for your justification and I can't stress that enough and I think once you've done one or two of these they become you know quite a straightforward process but they can be a bit tricky first off I think the common errors that I see with these are that people are not really understanding what the item is getting at so their justification doesn't quite address the right issue you will need to demonstrate careful consideration of each item you don't have to write an essay for each one you know a sentence can be more than sufficient but you have to demonstrate this careful consideration the other thing I want to mention which I think is is unique to the cell project to a degree is that we have researchers who work on the cell project with the data collection as well as working on the analysis of the data and it's a common thing that researchers when they're completing the ethics form will conflate the two and they'll answer the the items on the ethics form about the data collection and they shouldn't it's just focusing on the analysis of the data so that's really all I want to cover this afternoon hopefully that will break down the process and take some of the mystery out of it and help you to put together an application which whizzies through review on the first first go