 I work for JISC, we provide IT services across universities, colleges, schools well now. We do things like JANET, which is the network that all the universities run on. I've been involved with aggregate data now since about 1995, starting out on census data, but recently moving on to international socioeconomic data. I'm going to apologize now, this is a bit of a dry presentation, there's a lot of information here on data that we hold, by the way, that at the end I do a quick demo, to just highlight the proceedings. So we have data from some gold standard international organizations, IA, United Nations, World Bank, OECD, and the International Monetary Fund. And the data we hold is accurate, so it's counts of items at a geographic level, it's mainly national level. So counts of items at geographic level, we have a small amount of sub-national level data as well at regional level. It's all time series, some of it goes back to before the war, I don't think there's any energy data that goes back before the war, but it goes back many, many years. It's updated yearly on the whole, we have some quarterly and some monthly data, and we have some data that is a bilateral flow, so movement of objects or people, or in this case, energy between countries. World Bank data, we have the World Development Indicators. These are a set of indicators designed to monitor how countries are developing. A whole raft of data on greenhouse gas emissions, 42 different variables, access to clean fuels and technologies, access to electricity in an urban rural environment, access to alternative and use of nuclear fuels, we've got energy production in use, so looking at access to electricity, and whether to use renewable forms of generation, and we've also got a few items on transportation, so pub price, gasoline and diesel, air transport for passengers, railway lines, distance, passengers, amount of goods transported, which might be of some use to you. OECD, number of datasets, so in the environment statistic, we've got the green growth indicators, which are a set of indicators for monitoring progress towards green growth, to support policymaking and inform the public, and that database comes all OECD countries, plus a few accession countries, and key partners, including Brazil, China, India, Indonesia and South Africa. These indicate whether economic growth is becoming greener with more efficient use of capital and capture aspects of production, which are rarely quantified in your normal economic models. So CO2 productivity, so for instance they've got GDP per unit of energy related CO2 emissions, energy productivity, they have breakdown of percentage of energy consumption used in agriculture, services or retail business, there are 138 variables in that particular dataset. The whole raft of other datasets there, I'm not going to go through those, other than to say the structural analysis database, Stan, that's a bilateral flow of trade and services and goods, includes electricity, gas and water movement between nations. Unidow data, this is the industrial statistics standard industrial classification of all economic activities, that's quite a mouthful, revision for INSTAT 4, house details on manufacturing or in that covered CO2 productions, defined petroleum products etc etc etc. The meat of it is the international energy data, but the IEA and the Unidow data are behind the login, extremely accessible to anyone with a .ac.uk email address, so if you're at any university or you can access this data, but you just have to log in as well, just in case the members of the public are behind the firewall. So the IEA data, we have a full release for all countries in the world or all reported countries in the world in August and September and then in April and May we get an update for OECD countries. CO2 emissions, six datasets dealing with the generation of CO2 and other greenhouse gases from electricity and heat generation. There's detailed CO2 emissions data by sub-sector and by product and the indicators are CO2 related energy and socioeconomic indicators. Coal information, another six datasets, full balance data for different types of coal and coal products, including manufacturing gases, the product and values used to convert physical tons of coal, coal products, into energy, supply and consumption statistics for different types of coal and coal products including manufacturing gases again and detailed coal export and import data by country of destination or origin and again by type of product. Electricity, production, imports, exports and distribution losses, production by nuclear, hydro, geothermal, solar, tidal, wind, heat pumps, combustible fuels and the combustible fuels are broken down into different products as well and others. Again, detailed exports and import data by country of destination and origin, monthly balance of net electrical supply and net electrical capacity by the type of energy used to create that supply by capacity value. Coal information, average conversion factors from tons to barrels for OECD countries for 24 products, so for instance, net for in addition gasoline, kerosene, diesel and many others. Crewed supply, so crude oil supply and demand, so there's indigenous production, imports, exports, stock changes and refinery intakes, detailed oil and oil export and import data as well by country of destination and by type of product and product supply and consumption in the form of supply and demand balances, gross output, recycle products, imports, exports, transfers, stock exchanges and international marine bunkers. Natural gas, again detailed import and export by destination origin and natural gas statistics on production, imports, exports, stock changes, stock levels and consumption. Electrical capacity by type of renewable energy balanced by product type, flows of production, supply, imports, exports, and aviation and marine bunkers stocks and also energy supplying consumption statistics in matrix form, world energy balance and so on, I'm not going to say too much about these. They can quite often be some areas of data in other parts of the idea data, but net calorific values, energy balances in matrix forms and 50 energy economic and coupled indicators. The world energy statistics oil demand by type of oil, net calorific values again for conversion factors and detailed energy use by product. World energy crisis, exactly what it says on the tin, it's world energy crisis. There is some subnational data in this. There are only 15 countries report subnational data and that's our region level and region for instance being New South Wales, California, Texas. And last one, energy technology research and development, so it's the amount spent in millions of US dollars on different energy research and technology areas and that's by region or by country. That is also one last one. Energy crisis in Texas, these are did sets on costs of crude oil importation, exploitation, spot prices of various markets around the world and energy prices by country as well. And projections, I forgot about projections, projections that come in for wee bit of debate recently, they're seeing as being rather dated by the real world, but these are projections of cost supplies, the economic indicators and energy balances by country. And a picture just to finish and I will do a quick demo of .stat which is our interface to all this data. So, .stat is not a very exciting page. I will go into the international energy data. There's a login there, it will send you to either Athens or UKMF. Just login with your local user ID and here we have some data. All our data comes with metadata here, abstract how to cite the data and detail. Unfortunately, I can't show you my WordPress, I can't find a way to do that. It's not starting the list. If you go to that address, you can have a play around anyway. Any questions? You can phone us, email us or we've got an online query platform. You can just enter your queries into that. So, I think that's probably a good time for me to hand over to Simon Elam and he'll introduce you to the CIL project. Okay, thanks, Dave. So, I'm Simon Elam. I have a principal research fellow at University College London and I'm also the director of the Smart Energy Research Lab. So, the Smart Energy Research Lab or CIL as you'll hear as a refer to it, our vision is basically that it will provide a secure data portal for UK researchers. It'll grant access to high-resolution energy data and it'll facilitate innovative energy research for years to come, I hope. And it's complemented by our own kind of multi-disciplinary research programme. But it is a portal that's access to the wider UK community. It's a five-year project started in 2017, ending in August 2022. We've got £6 million of funding from EPSRC and there's seven universities in the Energy Saving Trust. In our consortium, I'm not going to go through them all, but I guess the other important one, so it's led by UCL and the other important partner for today certainly is the University of Essex who take care of some of the outbound data governance and they're also responsible for developing their technical environment, which is what Darren will be telling you out in a few minutes. So, why do we need CIL? It's kind of historically and currently been very difficult for researchers to get access to good quality demand-side energy data in the UK. So, Dave told you about some kind of excellent data sets that are a little bit more focused on the macro side and a bit more focused on the supply side in general. CIL is very different. We've got a very much focus on the demand side and very much on domestic buildings because that's where smart meters are being deployed. We will look in the future at non-domestic buildings but at the moment the focus is very much on domestic buildings, people's houses. So, we recognize that smart meter data could be a game changer because it provides access to this high-resolution, i.e., half-hourly energy data and that's simply not been available to us as researchers in the past or extremely difficult to get hold of those data sets. However, there are still some substantial barriers to accessing smart meter data. The technical environment is quite difficult to build. There's a lot of legal hoops that you need to jump through and basically it costs quite a lot of money as well. And so, that's why CIL was funded by EPSRC to be a central resource for the UK research community. They recognized that it would be very expensive and a lot of duplication of time, effort and money to build different portals of each university was looking to access smart meter data for its own projects. So, a central resource made much more sense. And as I've kind of hinted at, yeah, the focus of CIL is enabling research that is really kind of investigating energy demand or consumption or perhaps energy behaviors in domestic buildings. The type of research that requires using granular household-level energy data and that's why there are quite a lot of issues with gaining access to smart meter data because it's personal data under GDPR, the data protection regulations, and therefore could be disclosed which is why we provide access to CIL data via a secure lab setting. So, what data do we have? We've got providing access to pseudo-anonymized data in a secure lab environment, as I've said, and what we've got is smart meter data, so that's daily and half-hourly energy consumption data, both electricity and gas, where homes have both of those meters and both of those supplies, which is about 90% of homes. We've got the tariff data, so we know how much people are paying for their electricity and their gas, and we also have some additional technical data that comes off the smart meters, which is very useful to researchers, particularly when they're doing this kind of detailed type of analysis that we expect people to be using CIL for. And the other really key thing is having contextual data. In the past, any smart meter data that's been released has come with very, very limited contextual data, and that is the contextual data that really tells you a lot about why people are consuming the energy. So, the smart meter data will tell you how much and when, but you need the contextual data to try and untangle a little bit about why and what these drivers of energy consumption are. So, we currently have a short CIL survey, which is completed by all our kind of participants, which are households, and that provides some really useful household information, some socio-demographics, and it also tells us some characteristics of the physical property of the building itself, so what the walls are made of, what type of boiler they might have, and so on. We also can get access to energy performance certificate data and we link that to the other data we hold in CIL on a household level or a building level, a dress level. And we can also link in some weather data, so things like external temperature, solar radiance, and so on. So, a little bit about the overall system. So, everything you kind of see on the left-hand side of this diagram is the smart meter system that's being rolled out across Great Britain. So, that's the kind of infrastructure that we're leveraging in order to collect the smart meter data and bring it into the Smart Energy Research Lab. So, on the blue box on the left-hand side, that's the home area network, so that's where a electricity meter, a gas meter has gone and been installed in the house. That comes with a comms hub and you also get an in-home display, the IHD. So, the comms hub then communicates with the DCC gateway or let's put it slightly differently. Let's jump to the right-hand side. The research portal network is everything that we're building. So, that's our databases and on the far side of that green box, the research portal itself, that's the secure lab environment by which researchers can access the data that we have stored in our databases. Our technical infrastructure then sends a message through the DCC adapter, the small orange box there, to the DCC gateway and says, can I connect, please connect me to this particular comms hub so I can collect the smart meter data that's on there and it sends it back through the DCC gateway and back into our system. So, the DCC gateways are essentially a secure messaging service that allows us to access the smart meters from which we've had consent and we collect the data and it comes back and we store it in our database. So, a little bit of our design framework. I'm not going to go into too much detail on this, but in very simple terms, the data comes in at the top. We manage it. We have these two research functions, these two principal research functions, which is the observatory and the laboratory and then we provide data out at the bottom, two researchers via these different areas, I guess, of that output layer. The two research functions, so the kind of primary research function that I'm talking about today is the observatory function. So, that's where we will be recruiting around about 10,000 homes, which will be represented by Great Britain and we provide the data sets that I've talked about previously on the previous slide. We provide that to researchers and that's on an ongoing basis. So, that's smart metadata we're collecting every day so it's a longitudinal observational panel and it's really there for observational studies. So, the ethos is that we don't do anything else to these participants. They've filled in a short survey at the beginning, about once a year we may ask them to complete a follow-up survey, but we're not doing any active research. Which is active to them with those participants. If you want to do an intervention study, so you want to test whatever it is, whether it's a new tariff, you might be testing a piece of energy-efficient equipment, you might be testing the effect of heat pumps or solar panels, then you can do that via the laboratory function. Whereby, a research project can go and recruit its own participants. It might do a randomized control trial design and then we will collect the smart metadata for that research project and provide it to you, linked in with the other data that you need for that particular research project. So, that's the key difference between the laboratory and the observatory functions. Okay, just some ideas of the type of research that will be facilitated by SOIL. So, we have our own research program. I'm not going to go through each of those in detail, but I'll talk briefly about the top one, because it's obviously quite topical. So, recognizing the impact of COVID-19 lockdown was going to be probably the biggest natural experiment since the Second World War, in terms of its impact on energy demand. We very quickly decided that we wanted to study the impact of this, both in the short term and the long term. So, we quickly put together a survey that we asked of our existing SOIL participants, and we sent that out to them to ask them how things have changed. Obviously, people are working from home much more. There might be new members of the household during the lockdown period, or there might be fewer people in the house, because people are self-isolating somewhere else, or something like that. And, of course, we have the energy data both before, during, and after. So, we keep collecting the smart meter data, which meant that we're in a really good position or quite a unique position to be able to do this kind of detailed investigation of the impact of COVID-19 on domestic buildings and households across the UK. As you can see, there's a bunch of other projects there that are part of our research program. But, as I said, the SOIL research project, research portal, rather, is open to all. So, there's lots of other potential research that could be facilitated by using the SOIL research portal. A few examples are there. There are many others, and these could be your project. So, perhaps looking at how socio-demographic factors impact on energy demand profiles, looking at fuel poverty, looking at the distributional impact of switching suppliers or switching tariffs, for example. So, a little bit about how we recruit our participants. So, our aim is to recruit 8,000 to 10,000 participants, which are households across Great Britain, to our observatory panel by the end of 2020. As I said, the aim is for that to be representative of Great Britain households. So, we do quite a bit of work in terms of stratifying our sample and also looking at the results when we get it back to analyze and excuse. We conducted a pilot phase last year in August and September, and that was, again, a stratified random sample to all households with a specs 2 meter, which is a meter that we can actually access via this DCC gateway that I showed you in a previous slide. From that, we recruited 1,700 households, and that pilot phase basically was pretty successful, did exactly what we needed to do to inform then the main phases of participant recruitment, which will begin shortly. So, Wave 2 will go into kind of field work in August and September, and we aim to get around roughly another 4,000 participants and then roughly another 4,000 participants in Wave 3, which will be towards the end of the year, and that will take us up to our target of around about 10,000 participants. And the laboratory projects I mentioned, if you're doing a laboratory project, then you kind of recruit your own participants. You include consent for us to collect the data on your behalf, and then we'll collect that data and provide it to you. So, a little bit about data governance. This is very boring, but very important in terms of sales. So, it's absolutely fundamental to our development and operation of the research portal that we got this right, and so we've expended a lot of time and effort over the last couple of years making sure that we do. I guess in kind of simplified version, I'll divide this into inbound governance and outbound governance. So, in terms of inbound, that makes sure that the data we collect is via informed consent from our participants. These households across Great Britain that have volunteered to provide their data. And that's then fully compliant with the smart energy code, which is the UK regulation by which it regulates any access to smart meter data. And we also have to be compliant with all the data protection regulations, which is GDPR. And then the outbound data governance ensures that only projects have been approved by our data governance board can access data, and they do that via a secure lab using the FiveSafe protocols. So, the FiveSafe protocols were developed by UKDS and alongside ONS, I think. And so that's basically safe people. So, all researchers must obtain ONS accredited researcher status before they're allowed to get access. It's safe projects, as I said, must be approved by our data governance board. Safe settings, which is the secure lab environment itself. The safe data, again, we make sure that the data we provide to researchers is appropriate for those secure lab environments. And finally, safe outputs, which is any data that's taken out of the secure lab environment must go through statistical disclosure control before it's released, so to make sure it's fully anonymized or fully de-identified. In terms of our data provisioning, so we provide research data sets. There will be regular updates to those data sets, which will be either on a monthly or quarterly basis. We haven't fully decided on that yet. We also put a lot of time and effort into robust data quality assurance. So, we provide a data QA report with every release, which will provide a summary for researchers of their data quality issues that can be found in the data. There's no such thing as a perfect data set. What we're trying to get is perfect oversight or as good oversight we can of any data quality issues. We add some data quality flags and scores to the data sets, and we also add some derived variables relating to data quality that researchers can use when they come to conduct their analysis. Our first exploratory analysis data set was released in June. The second release will be coming within the next couple of weeks by the end of July, so that adds in more of the smart meter data than was in that first release, and it also adds some of the other contextual data that wasn't in the first release. In the first release it was only the survey data. In this next release we'll have EPC data and weather data in there as well. And then, shortly following the second release of the exploratory data set, we'll be doing the first release of the permanent collection, which is the data that you guys can access, and that should come in August. And the secure lab environment, so your channel for accessing the data. The UK DS secure lab is available now. That's the existing secure lab environment. But we'll also be developing, well, we are currently developing, presumably operational, our own research portal, which is just a variation on a secure lab environment, or it is a secure lab environment, slightly different to the existing UK DS secure lab that run very much on the same principles. So we've currently got a beta version in test, which is being tested by the SIL consortium, and the full research portal should be available in August or September. So there's a little bit of a kind of summary of where we are now with the project as a whole. We've established the technical infrastructure, so the DCC adapter service, a portal for our participants to kind of manage their own settings, and the data ingested management processes. That's taken an awful lot of time and effort to get the technical infrastructure to where it is now. And following that, they'll be working, or they are currently working on the research portal, which will be released soon. We've also had to spend a huge amount of time on the data governance framework and establishing a fit for purpose data governance board. As I said, that's kind of the boring stuff that we've had to go through, but it's been absolutely critical to getting this project up and running. And finally, the SIL research program is underway as well now. We've got six projects that have been commissioned, and they're in the early stages of getting going. But we are a live project, as I said. We recruited 1,700 households from the pilot phase, and that data has been collected and been managed for the last few months. And as I said, that first exploratory data set has been published via the UK Data Service. So the next steps, as previously mentioned, is to recruit the remaining participants, the remaining roughly 8,000 participants for our observatory panel. The research portals should be live and released soon. The research program, that will be continuing. And the Data Governance Board will start to review research project applications very soon. They're actually reviewing the first three later this week. So if you have a project and you want to use SIL to access data for that project, then please start applying or start thinking about your project application now. And sometimes a little bit more kind of information, or if you need more information on SIL, then there's some data or some links to our website there. You can also see some quite detailed information about the data via the UKDS catalog records. So the exploratory data set has got study number 8643, and there's a link to it there. And as I said, that information will get updated with our second release later in July. So if not all of the documentation you need is there now. It will be within the next couple of weeks. And then the permit collection will follow fairly shortly. And then on the right-hand side there, you've got the website address and an email address if you want. If you've got any questions if you want to contact us after this webinar, please do so. And with that, I'll hand over to Darren. Right, can I just check first that you can see the presentation okay? Yes, I can. All right, excellent. So third time lucky. All right, okay, hello everyone. My name is Darren Bell. I've been the Associate Director of Technical Services at the UK Data Archive since January. And I've been working closely with Simon Elam and UCL as the research partners on the SIL project now for a couple of years. Now I am conscious of time, but I'll try and get through these relatively quickly. So we have time for some questions at the end. But essentially, if you want to be really reductive about the whole smart meter energy research like the infrastructure, you can actually boil it down into about six icons. Essentially, we have a central data store. And either side of that, we collect participant consents because obviously we can't collect data without their consent. Once we've got that consent, we then actually pulling the data from the smart meters in their households. And that's all stored in a central Hadoop cluster at UK Data Archive. And we present that through a researcher portal and onto the researcher. So in a nutshell, although there's a lot more moving parts in the system that are represented here, this is essentially what the entire kind of SIL infrastructure consists of. Okay, so the first part of this really is capturing consents. This was a bit of a departure for the UK Data Archive because traditionally we're a data repository and we're not really involved more in the full life cycle. So this has been interesting for us to be involved in both the survey design, capturing consents, and the whole data management life cycle. So we went live with something called the participant portal last August, which was really to capture the initial wave of consents from people. So people come up to the site if they've received a letter, they'll enter their special code, and then register that they're giving their consent for us to capture their data. So once they've actually done that, the data is then stored, their consent data is stored on Amazon Web Services, which is on a cloud environment. So there are a number of moving parts in this. We use something called Amazon Cognito to capture and store all their encrypted credentials, and the APIs and data storage are all held on Amazon as well. So our approach these days is very much to have a cloud-first environment and infrastructure wherever possible. And this is something we've managed to do on Amazon Web Services. You'll notice it's a serverless architecture as well, because this is the kind of new hot thing. You don't actually have any physical machinery or what we traditionally understand is physical servers or hardware. Everything is effectively abstracted into memory. So all of these services and all of the interactions between the client and the server end actually don't require any hardware. So the data stored at the heart of this Hadoop, that's a separate webinar to go into what Hadoop is, but essentially it's a big distributed computing environment. Currently we have about two petabytes of storage available at the archive, because of replication and other things, there's probably about 1.4 petabytes of usable storage in that. So we are expecting, obviously, tens of thousands of participants over the next 18 months. So we do need a good deal of capacity there. So Hadoop essentially allows us to chain lots and lots of different machines together and aggregate that into one single storage point. So the main parts of Hadoop that we're using, because it is a big stack of software, is something called HBase, which is the real core database that allows us to store billions and billions of rows of data. And it's a very flexible schema, so we can add and remove columns on the fly as we need to. Something called HDFS, a distributed file system where we store all the original XML files that come into the system before they're changed into database rows. So, yeah, it's a very complex system, but it does allow us to scale up very quickly without having to inject lots of new hardware all the time. On top of HBase, which is the core data store, we actually use something called Janus Graph. Now, some of you may be familiar with Graphs as a kind of modern database technique. Without data too much detail, it effectively allows us to query interconnected nodes, if you like, interconnected points of data. So if we wanted to long-tune an analysis on connected variables, that's something that would be a good model to use. So we use something called Janus Graph on top of HBase. So as well as a participant for the core data store, we also have a system called Apache NIFI, which manages all the data flows going in and data flows going out in a very coordinated and structured way. So this is just one of our schematics for how we manage onboarding participant data. So very briefly in this diagram the participant will log on to the participant portal, give their consent. At the back end on Amazon web services, that sends something called a message, which is basically a small fragment of text that comes back to UK data archive, and then we begin onboarding the smart meter data from households on behalf of the user. So once those schedules, as we call them a setup, that smart meter data is then retrieved overnight daily. So for every household there could be up to six or seven different devices, whether that's electricity or gas or smart meter hubs. So we have 1,700 participants at the moment. In total we have about 8,000 of these schedules that come in overnight. So at the moment we're running about 1,700 participants. As Simon said, we're hoping to scale it up to 10K by the end of the year. So that involves having a system that's very flexible and quick to react in terms of scale. So the way this works at the moment is that we chain lots of machines together, effectively to function as a single ingest point and as the capacity needs to grow, we're just adding more machines into what's called a load balancer. So we can ingest larger and larger amounts of data without the system constantly falling over and crashing. So building resilience into the system is also important. So we have a number of dashboards and alerted systems that help us keep on top of the data flow. So what you've got here is just some sample screenshots from one of our dashboards called Lafarne which allows us to monitor how much data is coming in at any particular time and where thresholds might be that might in person kind of alert and tell us to do something to fix the system. So bringing the inbound data has not always been straightforward. The digital communications company gateway or DCC gateway is relatively new for people outside of the big six energy companies. So the data itself actually comes in as XML files and these have to be passed and you have to go through these in quite some detail, flatten them out. So they have been in a number of challenges, but duplicate postings that come back from smart meters often are missing postings. I don't want to name names on a public webinar, but there are certain smart meeting manufacturers who have been more problematic than others. We have something called alert storms where we can literally have tens of thousands of alerts coming back from the smart meters. So no data, but it's almost like how to sort the week on the chat. Unfortunately a lot of these problems have now stabilized and those issues have been resolved. So we have a relatively stable data ingest structure now. So we are now focusing on the front end. I mean the majority of the project for the last couple years has been setting up the back end, the data stores, the data transfer mechanisms, the consent and data governance, but now we can focus a bit more on the front end. The research support is in development. This is one of the early wireframes but this is what the actual site looks like at the moment. So we'll be allowing people to complete all their project applications online and having done that and gone through a data governance board they'll then be able to retrieve their data through a secure lab environment which is going on the cloud. The UK DS secure lab as Simon mentioned and the data will be available in there as well, but we are migrating across with the cloud first approach to making sure these secure desktops come in line as well. So in order to keep those secure we do the usual things. We encrypt the data at rest. We also have multifactual authentication as well which requires you to enter a one-time code. Anyone who's used a banking application will refer to that kind of authentication. So yes we're working on that actively at the moment and that should be ready towards the end of August in September as the first minimum revival product. So finally few key activities. Yes we're preparing for wave 2 onboarding and scaling up for hopefully up to 5,000 households. We're implementing infrastructure for those cloud-based secure desktops. I think Simon mentioned as well we did perform a COVID-19 survey which closed last week so we just ingested the data for that now which will provide some really rich contextual data for smart meter readings along with weather data and PC data and working on getting the research report for our researchers. Okay so that's it from me. I think we've got about five minutes for questions. So I don't know if anyone has actually submitted any one at the moment. I'm just looking for any questions. Okay I can't see the written questions but if you do want to ask a question by all means type something into the chat or into the questions element on the webinar console. Can I just ask the first please what energy data are all available via the IEA website and OECD iLibrary. I think that's one for me. We tick all the IEA and the OECD data but it's in our own platform. So all the data that is available there is available within the DOSTAT which is our platform. We actually work with the OECD and the IEA and the IEA we actually work with the OECD on DOSTAT is a community-wide platform that the OECD develop in conjunction with ourselves and a number of other international data aggregators. I hope that's answered your question. Okay is anyone else got any questions? There's a question about a deadline on the CIL project application so I can answer that. So there's no deadline the only constraint or time constraint is that projects must complete by August 2022 which is the funding for the first kind of cycle of the CIL portal. I mean we hope it's going to be extended after that but for the moment that's the only really kind of time constraint. I doubt it does take a little bit of time to get your project applications through quite a rigorous process which is the same process pretty much that all secure lab project applications go through so that's likely to take a month or two and you also need to get your researcher accreditation. So if you wanted to start soon then I'd advise getting in your project applications fairly promptly. I think you're also asking can you tell us more about the present 1700 households geographic representation and socio demographics. So yeah we've we've compared our so first of all we've stratified by region and by index of multiple deprivation so that in terms of our kind of outbound sample we've represented on those scores and then when we got the data back we were able to compare the socio demographics of the pilot phase panelists with other sources of information like EHS in the English housing survey census and so on to look at any skews in our data so there were some skews in terms of our socio demographic representation but what we're able to do is then in the second and third waves of participant recruitment which is what we started way too very shortly is we're able to oversample for those socio demographic for example archetypes where we were underrepresented due to for example in terms of the index of multiple deprivation what we found was that the response rate was lower in the more deprived areas higher in the less deprived or wealthy areas so what we can do is then adjust the amount of invitation letters we send out to those regions so that we should get something back which will ultimately mean that we're very balanced the only other thing to say about the pilot phase was that Scotland and the north of England were not represented at all in our pilot phase and that was due to a technical issue with the smart meter rollout in those northern regions or northern regions of the country northern regions of the UK again that will be corrected in the second and third wave so that when we get to the end of this year we'll have a panel which reflects the geographic distribution of the UK or of Great Britain what is that question? I'm just going through this other list of, there was a question for you Darren about Janis Graf and what kind of nodes you're pulling together? Yeah essentially what we're doing here is mostly around variable information so this allows us to effectively class different types of variables together either by geolocation or conceptual keywords so we have something called a variable cascade so within any particular dataset there will be a particular variable and what we can do is link those variables together between two data sets if you like and link those into concept schemes as well so that we can do kind of traversal between different variable linkages so that's essentially what's going on there at the top layer in the graph so I think there's just another couple of ones here there's another question about the application form which I can take for CIRL so the application form for CIRL is all within the UKDF system so what you need to do is register for a UK data service account if you don't already have one and when you get that you set up a new project and you will then when you set up your project which just takes a few minutes within the UKDF system you then would link to the the CIRL dataset now at the moment it's only the exploratory dataset so you don't want to use you don't want to link to that you don't want to bring that into your project you want to wait until August when the full dataset is released so when that happens you would link your project to that dataset you say I want to use this data for my project you'll then get all of the steps that you need to complete including the accredited researcher status and you'll then get the right project application form for CIRL and so that's when you complete the project application form so it's all within the UKDF system there's another question here I've got from Karen Denison is there any potential to link the data to other dates such as administrative records so yes so I mean well down you might give the technical answer in terms of the kind of data governance and the kind of research answer is that yes there is the potential to do that absolutely that's something that we have set up CIRL to do when you link so I provided before the datasets that we've already linked in such as energy performance certificates are actually an administrative dataset so that's pre-linked for you if you want to bring in other datasets then that is perfectly possible you'll put that on your project application form and that will get reviewed by a data governance board just to check that there aren't any particular data governance issues with linking that administrative data to the smart meter data and the other data that we have providing that's not the case which usually wouldn't be it will get approved by a data governance board and then you can you can link in that data and I don't know Darren if you want to comment about the kind of technical infrastructure for the way that's enabled yes I mean technically I mean obviously that would require some kind of linkage keys that we'd be able to perform those linkages on but yes I mean that would be taken on a case by case basis really as part of any standard project submission okay so there's another one here is there an application form for our CIRL projects the CIRL website and it mentions AR status etc yes so I just covered that as I said that's through the UKDS system yeah okay so if you've got any others then is the direct link to the ONS accredited researcher status application so I think you mean by that well certainly the accredited researcher scheme all researchers are required to have an AR number in order to submit a project so if you're talking about a direct kind of web link or something no not at the moment I think well yes so you can just google the ONS research accredited research accreditation service RAZ but I think there is a link to it on the CIRL web page if you go to the researchers tab or researchers page on the CIRL website there is a link to the ONS research accreditation service there okay absolutely no information on the process alright there's another one here Simon about how long in advance do I need to register an account to access the CIRL database well once you've submitted your project application we usually say that's about 4 to 6 weeks before that project would get approved but you also need to do some work before submitting your project application you need to get approval from your university research ethics committee as you do for any research involving human participants the fact that we've got there energy data means that this is research involving human participants as a rule of thumb I would start the process 2 to 3 months probably 3 months before you actually want to start analysis and actually start using the data the research accreditation process you need to go on a training course and pass an exam and you need to get your project approval and it's a very deliberately a very rigorous process that the UK data service have developed over many years for all secure lab projects so it does take a little bit of time as a rule of thumb I'd allow yourself 3 months before you start wanting using the data to get the project approved to get your ethics approval and to get your research accreditation okay alright I'm kind of very conscious of time now so I think we'll have to wrap up there okay