 Hello everyone and welcome to this webinar about the NERC Digital Solutions Hub. My name's Richard Kingston, I'm Professor of Urban Planning and Geographic Information Science here at the University of Manchester. So what I want to talk to you about today is this NERC Digital Solutions Programme. It's a five-year eight million pound investment programme to build what UKRI, UK Research and Inversion Colour National facility. So this Digital Solutions Hub is in phase one at the moment. It's funded by NERC, the Natural Environment Research Council and it's supported by NERC's five data centres across the bottom of the screen there. If you go and search for the NERC Environmental Data Service you'll be access data from these five data centres. One of the interesting things about what this programme is doing is our core focus is to work closely with stakeholders outside of academia, so across the public, private and third sector. We also think that there's a lot of academics and researchers that are not in kind of environmental science and climate science who are kind of NERC's core sort of users really that would benefit from what we are doing. So apparently this is all about the fact that these five data centres across the five of them they sort of curate over 40 petabytes of environmental data. This data is mainly used by kind of climate scientists, environmental scientists, physical geographers, but there's real value in this to others outside of those kind of domains, particularly when we look at think of organisations within government, whether that's national, regional, local, but also environmental sector, whether that's in industry or third sector. So it's huge potential. So the main primary sort of aim of what we are doing here is to make better use of this data that sits in these five data centres. Now when we proposed what we were going to do to NERC, we said well whilst this data is a great resource across those five data centres, actually where you really get benefit from this data is when you combine that with a whole range of other social, economic health and other environmental data. And what we're doing at the moment is just focusing on the UK, and that's the whole of the UK. All four nations from a national level down to country level, down to regional, local authority level and much lower down to neighbourhood scale, depending on who the users of this hub may be. Obviously some of the data that NERC hold goes outside of the UK, it's global, in fact British Antarctic Survey, a lot of their data holdings cover not surprisingly Antarctica, but also Himalayas and a lot of things around sort of Greenland and the Arctic around ice melting and things like that. The other important thing about what we're doing is this isn't just some sort of huge data portal, it's not just a place where you will go and find you know catalogs of data. This blue image in the top right hand corner, that's NERC's supercomputer called Jasmine. So the hub will sit on top of Jasmine, and what that allows is something quite unique that a lot of our potential users haven't had before, and that's high performance computing to do analytics and modelling on this data. We're currently running until 2025, and then the idea is hopefully if we do this correctly that we will then get additional funding for another five years to keep this this facility running. So one of the reasons why we rewarded this, I don't know if any of it's quite an old film now, but if you've ever seen the Kevin Costner film Fields of Dreams and they talk about this idea of if you build it they will come. Well we're not doing that, our overall approach around if we built it, i.e. we built some kind of digital solution, this digital solutions hub, and then we'd go out there to the community and say look we built this great thing you should really do it. Well no, no, no, that is not the way you do this. We could have spent four years building this out of five years and in the final year go out to all of these users and say look at this great thing it can do A, B and C, but what users then end up telling us we know this from previous work is well great it can do A, B and C but we also wanted to do X, Y and Z, and at that point you run out of money and you can respond to that requirement. So we've shifted our approach and decided actually you need to put users at the front and centre of this. So we spent a large part of going back into when we're all in lockdown a lot of my time was spent talking to all of these organisations these are sort of the big players at the national level across the UK but also lots of other smaller organisations particularly local authorities and smaller environmental consultancies, the kind of organisations that could really benefit with using NERPs data and integrating in with their own data. So lots of conversations with lots of organisations but people have funding that we've got. So as I said earlier rather than building something first and then going out to potential users and say look at this great thing we turned it the other way around and said well actually we'll first of all start off with some context scalping map the landscape of stakeholders understand and that was again some of those initial conversations during lockdown understand what some of those users want and then we had a really kind of intensive phase from September 2022 to February 2023 where we went around the whole of the UK doing face-to-face workshops. So we targeted very particular organisations who were open for anyone to come and I'll say a little bit more shortly about the kind of people that came along. So we did all of that work last winter then through March to July all of the knowledge and information that was generated from those user workshops around the UK that was all analysed and brought together and we developed what called user personas and identified different scenarios of what users want to do with this data. That report is available through our website there's a link at the bottom corner of the screen there and now what we're doing is going back to those users the ones who ticked a box on their paperwork saying you're happy for us to follow up with them and we're actually digging much deeper into their requirements. So we're doing two of my colleagues who are my postdocs doing one-to-one interviews they're starting in the next few weeks they're happening between now and middle of December where we really dig down into well you told us this what do you actually mean by that what exactly do you want this hub to do within the context of what you're doing. In parallel to that we've obviously got a lot of software development ongoing we're testing and trialling different approaches with 40 potentially up to 40 petabytes of data available to users we can't second guess exactly what data sets users are going to be wanting to use but that creates a few challenging issues about how you ingest that data how you extract it transform it how do you load it into this hub. At a time when NERC are also thinking about net zero and not wanting to shift vast amounts of data from a data store into into some other environment and then the idea is from the second half of next year again my little face in the corner is covering this up that the second half of 2024 will go out doing user acceptance testing with those users from the workshops and then we'll move on to doing you're rolling that out to a much wider broad of users during 2025. So this is some of the headline sort of outputs from what users told us and some of the problems the face that they really struggle with the fact that data is held in lots of disparate places whether that's within their own organization or lots of other organizations national government like data.gov.uk. Data also is not held in formats or in systems that make it easy to search and find. Often if you don't know exactly what you're looking for it's difficult to find relevant data. It's not always obvious what the purpose of different platform or different platform is and the variety of data that they contain. So over the last decade or so there's been a pretty big growth in the number of kind of data platforms not just you know at national level in this country but other countries and and for a lot of the users we spoke to they just said they were felt inundated with with so much data they didn't really know where to start. So it's hard to keep up with the sheer range of different portals different hubs different places where they could go to find to find data. They also found some of these platforms were rather clunky difficult to learn difficult to use it would take them multiple clicks to get to actually the point where to the data that they wanted and they just found it rather large use of cumbersome registration process and often found things where and we find this with some of the tools we developed sometimes I'll get emails from people in local government saying they can't access the tools outside of their organization because of the firewall restriction and things. So a lot of practical problems that we've not really thought about that they were raising in through these through these workshops. In terms of some of the kind of headline key requirements that they were asking for what users want sufficient access to the data to be able to quality assure it and make sure you know they can clean it and transform it into suitable formats for their particular use. So what this sort of comes down to is what they don't want is to not be able to first break well yeah preview the data ready to get an idea of what that data is what's the metadata where does it come from what's its provenance and you know is it is it really they don't want to have to have gone through lots of spent a lot of time getting this data then doing things with it to actually then realize it's not quite the data set that they want want it. So certainly reviewing a sample of the data to help them determine whether we're not suitable in a far easier way than it's possible at the moment. Normally you'd have to download access a full data set or if they may want to preview a subset of that to get a feel for it and understand if that's useful to their needs and they also want to be able to track keep track of the work and the resources that they've used and what what sort of processes have applied to the to data sets. So the idea there is that the hope will when you log in you would have your own workspace on there and it would be able to keep track of your workflows and what you had done previously. So when you go back in it's not completely forgotten about about way you work. They're also keen to avoid what they refer to as duplication of effort excuse me through sharing of work that they've done on data sets and access in the work others have done. So this has there was lots of discussion around this. So the idea that if you are working say for example one one example is if you're in the southwest of England and you're doing some some work you work in a local authority they were quite interested to know whether or not someone else in another part of the UK had been doing similar work and was there a way of sharing that and some kind of way of having a peer to peer learning system built into the hope which was again something quite interesting that we'd not not really thought about. They also wanted to know what were the most suitable ways of applying suitable analysis software to data. So what was the best toolkit? Should it be spatial analysis software? Should it be something that's now allowed them to do coding? Do they want Jupyter notebooks and things? So what's the most suitable software that they should be using for the kind of analysis they want to do? And again building on previous work done here at the University something called method box where that was developed in computer science where you can even put in what your criteria are and it would tell you what the most suitable method that you should apply to that data. Another key requirement as well was allowing users to combine and link their own data with other data as part of the analysis. So an area where they can upload their own local data that they don't want to share and allow others to see. So what we've done with that and we worked with Open Data Manchester who run the workshops with us and we developed this idea of a user journey and all of our users there were over a hundred people turned up to the workshops from different types of organization issues. We could put them into this kind of user journey and everywhere we would you know some users would be there to define a problem, others and some users went through this whole journey but others would only be part of this. So some users might just be defining a problem then that gets passed on to another part of the team who are getting data and then analyzing that and others who might take that analysis and then somehow showcase it and then publish it. So one example of that was a couple of people who worked in the south west of England in a local authority they were responsible for kind of telling stories with data to convince the public why a decision had been made and they were using some one way we can do that is through something called story maps where you're integrating analytics within GIS and within mapping tools but you're also telling a story around that to help support a particular decision that has been taken. So one of the things we definitely know what we're going to be doing is integrating the what's called story maps from Esri the GIS software company on the hope so because this came out a lot around the country people want to kind of visualize the data in a way that is useful to a much wider audience. We're also able to put our users into what's called personas or archetypes all users we possibly do an analyst and the analyst the author the leader the investigator the specialist and the data steward and again we've got a report that you can access through our website that digs you know about 80 pages long but if you're really interested in this that digs down into what all of this means. We also put all of this data so we've got this isn't publicly available but we've got a rich data source it's all of the users and we're put into they're all anonymized but we know the different types of organizations they came from what data they used which departments they were using data from and this is an interactive tool that we can use to really understand how users are using data and what they're using it for. So yeah it's an interactive data set with over a hundred users from 84 different organizations these are these are people who came to our workshops. A really important thing that we're doing is following what are fair principles so making sure that the data is findable accessible interoperable and reusable and I don't know if copies of these slides will be made available but each one of these is is hyperlinked here so you can dig a much deeper and find out a lot more about the fair principles but without this the hub will just will not work you've got to make the data findable and accessible in the first place so that users could know you know if you don't quite know what you're looking for you'll be able to find things and so a good example of this so the NERC have what's called the data catalog service and if you type something into that search so here we've got you know where will be either where will the hottest place in the UK be in 2030 it gives absolutely no results that's because you know it was not designed to answer this kind of question what you'd have to put in here is UK CP data 2018 that's the UK climate change program data for 2018 it would then probably give you a list of data around UK CP 18 so what we have done is taken a different approach and using a large language model we've trained that larger language model is Google's third model on all of NERC's metadata records that's for those 40 petabytes of data so in our approach this when you say where will be the hottest place in the UK in 2030 it doesn't give you the answer but what it does at the moment do is give you a list of datasets that will help you answer that question they're obviously working you know the next steps around this would be to actually then allow you to click on one of these links and it give you a preview of that data and then make the decision as to whether or not you want to bring that data into the hope for you to do some analysis so I guess an important thing here is to say that this is not just a data portal as I said earlier it will run on jasmine supercomputer and the fact that the vast majority of this data is spatial means it lends itself to some kind of GIS approach to this and also we're obviously integrating this not just with the NERC data but the other environmental social economic and health data and allow users under that first fair principles to kind of find and explore data so this we're building on things we've done before this is an example of work myself and a colleague Sarah Lindley developed we've been doing this for about 10 years now and we're rebuilding this using UK CP 18 data and that'll be relaunched it looked very different to this but will be relaunched early next year with updated climate change projection so we're building on things and testing things and trying things and basically developing a series of what we call it a set of proof of concepts so this is to allow users to test different data types and formats test out different types of functionality and meet some of the user requirements and we're doing this through four broad use cases one around air pollution and health one around our housing and environmental constraints another one around flooding and particularly flooding in coastal communities bringing in tidal surge and sea level rise and then another use case around heat stress and each one of these it brings in a range of different data types in to greats the different data centers but also having to spur them to a lot of these big national organizations and government departments we know that these are things that they are interested in so it allows us the software engineers to build something and try things out but also produce kind of data products and tools that we know some users and they find very useful but that's not everything there will be you know this also has to be quite a general system as well so using maybe the same methods and tooling and analytical approaches but you could be you know you could slot in another use case there depending on your what your job role is and what you're trying to do with with data to go into that today so if there are further details you can get the report from the user engagement workshops is now available on our website there's an online webinar with open data Manchester on the 20th of november you can find out more about that if you sign up to our newsletter that'll be a two hour online event and we'll talk in a lot more depth about our user engagement process so yeah more on the website so thank you for listening