 My name is Matt Fry from UK Centre for Ecology and Hydrology. I'm really very pleased to be hosting this first webinar in our AI Environmental Science Series, which is supported by NERC and the Constructing a Digital Environment Programme. So the programme aims to develop the digitally enabled environment benefitting researchers, policy makers, businesses, communities and individuals alike. Yeah, so this is the seventh series of a fantastic set of webinars that the programme has been organising. And in this one, we'll be considering the role and opportunity as well as some of the pitfalls in the use of AI in environmental science. So the format of the webinars is to have a presentation, invited presentation from leading experts in the field, followed by a chance for Q&A afterwards. So just to give a bit of background to the programme. The advances in digital technology have led to a rapid increase in the volume of data being captured, curated and managed on a daily basis. And alongside this, new technologies have enabled a step change in global capacity for integrated monitoring, analysis, modelling and visualisation of the natural environment and potentially transformative spatial and temporal scales. So by harnessing these advances in technology and the UK's leading position in both environmental observation and computer and data sciences, there's an opportunity to create a digitally enabled environment. And this is something that can be achieved through approaches such as integrated networks of sensors, so in situ and remote sensing based together with methodologies and tools for assessing, analysing, monitoring and forecasting the state of the natural environment at the highest spatial resolutions and finer temporal scales and previously possible. So the programme has done a huge amount of work in this area, including the seminar series through a range of projects and other activities. But yeah, so can I invite the audience, if you haven't already to subscribe to our YouTube channel so you can see that at the bottom right of the slide there. If you haven't done so, so you'd be able to see all the talks to date and the upcoming talks will be put on there as well. And I think Josie is going to paste the link in the chat as well. So, yeah, this series is about AI environmental science so it's the seventh webinar series focusing on the development, use and application of artificial intelligence techniques in environmental science. So AI tools are enabling new analytical value to be delivered from existing sources of data as well as providing powerful tools for gathering new data. This series is covering activities across all areas across these areas and we're trying to bring in lots of different areas of environmental science and lots of different kind of methods methodological approaches. So we've got some really fantastic talks coming up over the next couple of months. So please keep an eye on the program as it as it continues. So today's presentation first is is from Dr David topping of the University of Manchester and the Alan during Institute and he'll be talking about AI in environmental sciences from research developments to underlying infrastructure and policy implementation. So David's research interests focus on building computational models of atmospheric aerosol particles for use in interpretation of measured properties and a submodel through incorporation into climate change models. So the research areas highly multidisciplinary covering physics, chemistry, numerical methods and computational science. In addition to that, to that David's work includes evaluating how machine learning might mitigate existing complexity bottlenecks in atmospheric modeling experimental data analysis and impact assessment. He collaborates on methods that combine both air pollution data and human symptomatic responses, and he's also co director a new program to develop underlying infrastructure that will connect environmental data with other domains and support for further development of AI and associated movements including digital twinning. So, in terms of the questions and answers please feel free to post any questions into the q&a section you should see the button at the bottom of the form rather than the chat. In the in zoom, we'll collate these after the talks. And I should also note yeah we're recording the talks so thanks JC for that. So yeah without any further do over to you David thanks very much. Thank you very much, Matt. Hi everyone. So I will share my screen. Go through the steps to get this up and running one second. Okay, so hopefully you can all see that. So again, well it's a real pleasure to be invited to give this talk with a ridiculously optimistic title, I think we could have many talks covering different aspects of these sets of slides. But it's certainly an accelerating area of interest and one I think that is key in looking at the round in terms of taking environmental science into the next decade and even longer. I think we all appreciate that the, I guess the direction of travel that we face as a research community and the pressures that are on us and the people we wish to benefit from our tools are changing. And the time scales are changing. And thank you for the introduction to myself, Matt really appreciate that. I'm a physicist by trade. I had a PhD in atmospheric science what feels like a long time ago now. And really my movement into the data science sphere I guess might follow many of my colleagues in the community in that from a scientific perspective. I became quite frustrated that we had this increasing complexity in physical and chemical understanding in the atmosphere in particular. We hit barriers with regards to how very robust traditional tools were unable to start to integrate all of this complexity and really truly understand change. So is data science able to capture that and take us forward. Like I said, the talk does have a particularly broad title, and I'd like to break it down into four areas and really hopefully hear your feedback on on all of the above. The aim is to provide a view of this end to end process that integrates AI and environmental science domains, and I'm using end to end processes and similar terms interchangeably, because I think the drivers that require implementation of data driven services at the policy interface will help us rethink what we need as a research community. The requirements we need to request in terms of our support and technological development moving forward. So in the first area, I'll briefly talk about broad developments for AI and environmental science. Obviously there's hugely exciting developments in this space and, and it's great to see that this webinar series will present some of those. This can only be brief. But I want to frame it in the sense of moving on to the second third and fourth sections of the talk, which, which aims to prevent this and provide this end user perspective. And as an example, in the second section. I think it's nice to use cities environment and net zero is a really interesting and fast moving case study that bring together the academic and non academic stakeholders that are already placing questions on the responsibility we have to deliver new insights from a research environment to an active policy implementation. I think the third area follows where we think about our community and reassess the importance that we place on underlying infrastructure. And infrastructure is such a broad term, but specifically, I want to think about digital infrastructure and, and how this brings with it an ecosystem that we sit in as researchers as active scientists in this field. So if we're really looking to the future. Spoiler is a spoiler alert is absolutely bright and exciting. I think we just need to reassess what we value in terms of that supporting infrastructure without seeing this as any barrier to innovation. And I think we can start framing some of those conversations in a really useful way as the people who understand the science behind this area. So for the first section on environmental science. We've known for a very long time environmental science covers many spatial and temporal scales. We already have a very diverse set of data driven challenges. My head is still in the early 2000s and yet a paper from 23 years ago. I think summarizes quite well. The broad narratives around AI integration in environmental science with were hitting all of the platforms that we expose ourselves to ourselves to at the moment. Of course that that narrative is about the importance of data and our human role in environmental change, which ultimately requires a multidisciplinary approach so breaking out of these traditional research silos. The data science process on the right hand side is one we're all familiar with it's what we're trained to do. You know a scientist from from undergraduate level upwards, but perhaps perhaps we haven't referenced it in the same way. So I think our narrative is changing to take a data driven approach. Collecting data cleaning data understanding data using insights from data to tweak theoretical understanding develop new models develop new insights and then revisit the need to generate more data that has been obviously fantastically effective you know it's got us to the point where we know the impacts that we're having these are evidence that visualize their delivers their delivered to stakeholders. I guess if I was being controversial and of course self critical. One might say that these traditional approaches are perhaps being too linear and required connections between too many silos often happen through chance. Evidence of serendipity occurring through communities that work closely together. I think emerging data science methodologies could perhaps seem to short circuit some of these processes that sounds very negative. But of course as we understand global change we're under increasing demands to arrive at solutions quicker. And the solutions have to be robust. When we talk about AI in any area area of science we do often reflect on the volumes of data, heterogeneous data that needs to be integrated into decision making at scale. And the increased frequency of the solution requirements. And the heterogeneous nature of data we have access to all often with the facets of big data replacing new constraints on us. So we have an experience where from a numerical modeling perspective, in many cases a mechanistic framework that joins an environmental process with associated impact simply doesn't exist. Some areas are perhaps even easier to consider in this in this regard than others perhaps, again in my own particular case linking policy intervention through environmental change through human health outcomes. So I could perfect use case for data driven techniques to be used. And in the list of bullet points provide some some more examples here, such as the fusion of the system observations from remote sensing platforms. The requirement to deal with unstructured data and of course the cross link with linkage with some social economic systems. I guess at the moment, we could reframe some of that and an interesting narrative that brings some of these challenges together is of course this pathway to net zero. And whatever we think with regards to whether that term may exist in a few years, it's already leading to, you know, a range of potential interventions to be formulated. And questions that frame these interventions are important in the sense of understanding where data science could be used to connect, for example, improve health health outcomes with a climate policy. Whilst we try and avoid those unintended consequences of change. And I think so. Let's again another example within the atmospheric and their quality sphere. We want to avoid a diesel gate right in rolling through these local interventions. And whilst it might have traditionally been hard to pull together the evidence base to inform those rapid decisions can AI and brought more broadly data science help us in this regard. And this requires a multidisciplinary approach we often talk about this all the time and delivery of that approach can be challenging. We often focus heavily on model development when it comes to AI, whether it's deep learning or or more traditional areas of machine learning. What I'm trying to get in this talk is, it absolutely is going to be essential to embrace broader data science movements but we do need to consider the underlying data infrastructure from metadata standards, trust, and then think about how we then reframe our data infrastructure as scientists. As we continue our environmental research program projects and programs is always interesting to see the direction of travel with regards to new technologies and how they may become more widespread, more accessible to us. And if you're not familiar with the Garner Height cycle, and every year this is published in a range of different technological disciplines, and it gives a qualitative schematic of the progress from a point of innovation. This is one of the questions of a particular technology, how a small community takes that technology forwards to the point where you reach a point of inflated expectations as that technology becomes more accessible to a broader practicing community. We actually really understand the true use of that technology as we approach what's called this trough of delusion, then you can use that term interchangeably for a Monday as well. And then eventually we actually raise the expectations once again, and we realize that actually the true use of the community through practicing scientists and perhaps policymakers. The reason I've put it here is because I think obviously there are some really interesting directions of travel from the 2022 report 2023 isn't available yet. I think it's around June, June time. We see reference to accelerated artificial intelligence automation, of course, hot topic at the moment in the present climate of large language models. But what does this mean for us, I mean, in this area covers causal AI, which identify and use cause and effect relationships to go beyond correlation based predicted models. We see transformer architecture models such as a large language models, generative design AI, also known as AI augmented design and machine learning code generation. Of course it could be a very long time before some of these tools become widespread. But actually when you look back through subsequent years of this hype cycle, and what's happening in the environment space now in terms of research publications we're already using some tools that were actually pointed out a couple of years ago. So it's interesting to see where we might head but again I need what I'm thinking around in terms of this talk is is what we need to do as a community to embrace it but embrace it and in a trustworthy and an effective way. So there are obviously some really exciting examples on the use of AI and peer reviewed literature through the community. So adoption of these emerging data science methodologies for example, monitoring technologies in the environment can integrate onboard classification models. A basic example that we were involved with where a Swiss company has developed on the back of industry standard deep learning architectures. So again there's some really nice examples of transfer learning approaches and an ability to detect airborne pollens in real time, replacing the new studious approach of manual offline classification and daily concentrations. And by connecting the outcome of that classifier to a real time database, we were able to deliver interesting and important information on pollen to people who might be interested in associated health outcomes. So this is a good example of that end to end process we talked about before, and the nice interplay between academia and industrial development. In 2019, and you'll have all probably seen this slide figure before. There's a fairly seminal paper presenting a commentary on machine learning integration, being the key to tap to taking large scale models to the next generation. These are models that can integrate diverse data sources to deliver new insights. This is such a broad area of development, it requires shifting paradigms in terms of the programming environments we use. Do we want to rewrite all of our large scale models in another language, these languages that can integrate machine learning and process models in a single framework, thus moving to physics informed systems and increased trust. On the other hand side we see a fairly contentious diagram but nonetheless one that I think presents a schematic of the interplay between black box, grey and white box models to deliver new insights, hopefully driven by the push to explainable systems. I don't necessarily agree that black box models necessarily produce high quality results, personal opinion but again I think it shows what we're doing as a community is being pragmatic whilst being cautious in integrating effective technologies in a unified way. I think I'm in a safe audience here to suggest that there's perhaps more positive movements in that space from applied scientific domains than you otherwise would see from some fundamental publications. For example so choice of machine learning architecture to deliver a particular problem is of course requires testing effective outcomes according to the question you're trying to answer. But now we're already seeing the use of automated workflows that do this for us. And this slide was provided to me by a colleague who's been using auto machine learning which is a community package now being co developed with Microsoft. I understand to do it to integrate different components of remote sensing data. So the product is much larger than the sum of its parts here using multiple, multiple data on trace gases to improve predictions of PM 2.5 from an existing global model. I think that self is interesting on a number of levels because in theory it could remove some of the programming requirements from the domain scientific perspective. What's of course, what's still requiring me to test and validate in a robust way. And I think it also shines light on our ability to increase the value of extra products that we capture. And in this case, therefore, you know, revisiting what we want to further invest and continue investing from a monitoring perspective. And we see examples of crossing those multidisciplinary silos that were otherwise quite difficult. And this is an example of work that we were involved with through the SPF clean air program, which develops the data integration model for exposure. This integrates data on daily travel patterns and activities with measurements and models of air pollution using agent based modeling to simulate daily exposure. And on this slide is the prediction from the Spencer model that was developed at the Alan Shoring Institute used during the code response to understand by by LS away where people spend their time by demographic and age group. And of course, once you have access to that information, you can map that onto predicted pollutant fields on measured measured fields and really break down the predicted. And I clarify it's a predicted personal exposure estimate by micro environment. Massive caveats here in terms of the availability of data in those environments but again, it shows an it shows an example of that workflow in developing a framework that can frame predictions in a in a way that's useful for people who have to think about effective policy implementation. And once we could go on, there are many examples in this graph we see a schematic that captures the different facets of data science, and we can pinpoint really interesting developments in our own environmental sphere from machine learning and deep learning such as methods to identify extreme weather events land use types, or developments of physics informed models to really target that bottleneck. And hardly when we talk about AI we're not just talking about model developments either, but obviously integration of insights that deliver those through those models in an automated way, and there are large global developments in this space. I've just copied an example here from the recent United Nations report talking about programs to design more energy efficient buildings to optimize renewable energy and so on. Using all these developments there's a family of methods on the bottom left from semi supervised learning through to probabilistic models generative methods and so on, with the associated family of programming environments. It's important to keep an eye on that domain knowledge requirement and we need to retain this as we move forward. One critical component of this slide however this is this idea of the underlying infrastructure. It's now relatively easy. I say that in quotation marks for at least programming researchers to access fairly complex machine learning libraries and move towards development of new models. But when we talk about environmental science, we want to connect our research with improved outcomes for us on the planet. And this is where I think the infrastructural requirements are really drawn out. This is why in the second example we look at we look at a use case with cities and net zero the time scales for implementation of technologies and interventions can be really rapid in this space and it's dictated by a broad range of stakeholders in and outside academia, dominated outside of academia, perhaps, and in many ways cities acts as this theater for in very many environmental and societal challenges, and already we see interventions being proposed and rolled out. The very interventions across these natural and social systems is often however frequently held back by this limited evidence base. This could be the result of insufficient data. And also this understanding how individual systems interact with each other across sectors and scales. So there's pressure to adopt rapid solutions already bringing that scientific evidence to bear on understanding change is obviously essential. We do this manually with a collection of data science driven tools and existing modeling tools existing insights. Here we have an example where through the development over many years of this living lab approach. We were able to study the environmental change associated with the unfortunate progression of the pandemic. What we're used to in this case, the improvement or at least change in air quality, a change that was seen to be associated with huge changes in urban mobility and activity. What this small use case demonstrated is what we're able to do is combine time series so data driven forecasting techniques driven by historical measurements of nitrogen dioxide. Combine that with access to data on traffic systems to look at what the concentration should be under normal operation. So this, this, this, this slide is showing the measured data in red, the forecast data using that traffic data in green. And then the forecast data in blue, assuming normal operating conditions. What we found that was that even where these new data driven modeling techniques were not able to correlate the measured concentration with reduced traffic. Existing modeling techniques helped us of course tease apart this complex interplay between potential changes in local interventions and what can't be controlled outside of a regional boundary. And it was essential that they, you know, we framed the insights from these tools using current understanding of the scientific processes involved, even at the individual level of instrument response functions. Data science methodologies that have been around for a while helped us confirm source changes in particulate matter when total levels didn't change. We saw increased signatures from wood burning. Garden waste wasn't being collected people were burning their garden waste in the evening. There were major modern fires. And also there was changes in the traffic nature there was an increased ratio of HDVs, as we were starting to order more things online in the morning, despite the heavy reduction in total traffic. Of course, this is all well and good, but it's desirable to automate this, this similar type of workflow so that more agile developments and insights can connect scientific great tools to people who can implement change. Here we see the rise of digital twins as a general narrative. We're finding an alternative narrative to this infrastructure we built so far. And on the right hand side, we see a useful diagram that differentiates data flows between physical and digital objects, where the digital twin has a fully connected data flow and is able to implement change in the physical objects. We've been interested for a long time of course in the engineering sphere such as work around jet engines and predicting risk, but now we see lots of conversation, and I'll have to say some of it probably a little bit to it optimistic when we look at the natural environment. In the sense that we need to understand the value of near to real time data and the changes that can be made on both human driven and natural systems. There are some interesting potential uses here. Air quality and health outcomes could be an interesting development, particularly in controlled environments like indoors, when you can imagine connections between building ventilations activities and air quality and the like. And of course digital twins could be used to play back multiple scenarios to simulate potential impacts before implementation in the real world. We have an exciting development and it's absolutely creating huge momentum so it is taking us down a direction of travel whether we like it or not we need to embrace where it might be of use. But we now, even with a digital twin we have a range of examples where perhaps controversially. We can say that open data by itself is not enough, and we can even stretch this to potentially interesting outcomes on tools built around this open data, if we don't have appropriate provenance. We already have examples of the local environment and policy interface, where we see in this example provision of data streams provided by a third party provider that was openly available. Because of course environmental science AI and digital twins is not just seen as a development that we can control in academia, there are solutions that being built that provide potentially quick answers to local policy providers. And this was simply a case of an air quality data stream that being made openly available during a low traffic neighborhood scheme. So we found it was, it was the data stream was just not right, it had zero provenance. It was taken offline and it completely locked out community groups and the local council authorities. It further so so distrust and push back against appetite for delivering interventions that are ultimately designed to benefit all. So I think this moves us nicely onto this realm of discussion discussing supporting infrastructure in this broader AI and environment sphere. I'm really excited by the development of new tools. I'm not convinced we fully appreciate all the infrastructural needs, particularly if we're talking about the interplay for implementing change. And I'm not suggesting this is something we all need to take part in because of course it's very easy to talk about infrastructural developments. This ultimately requires huge amounts of software engineering architectural design. But I think we do need to review about how we value maintained data platforms and agreed metadata standards. For example, are there enough robust label data sets out there if we're developing new models and digital twins. What are the standards that need to follow. And of course this is an interesting landscape in itself. The Urban Observatory team did a project last year and an ongoing project as well for the Department of Transport. The Urban Observatory is an EPSRC funded initiative, and we did a review of metadata standards and data accessibility at this interface of environment and mobility. And of course you find a very mixed bag, and this can vary by region. I think a nice outcome was it seems that the environmental sector at least is one of is one of which the adoption of metadata standards has occurred to a considerable extent. And it doesn't seem to go together with open AI is that everyone can access data that could be accessible through a series of steps, steps by established academic researchers like ourselves doesn't necessarily translate to easily accessible data by all. I'm sure we have our own examples we could talk through here. So I'm going to use this to kind of sneakily move on to an exciting nerd funded program that we're involved with here. There will hopefully be of interest to this audience, the Digital Solutions Hub is a program to tackle some of these issues in partnership with the fantastic work that of course the new data centers centers of champion for many years. We're looking to connect to UK's environmental data with data in the public and private sector, and this moves us away from having data sets that are only accessible for academic research. But we're really trying to understand how we make this data more accessible to all. And of course this has to come with everything we've talked about to this point. And the basis of that program that requires us to connect environmental data with health data is the development of these trusted research environments, and we're aligning our work with a huge body of developments and programs in this space, such as And the features of TRES are really essential to winning and maintaining public trust, where we look at data quality, security and transparency and transparency. And of course this is important in not just health data but other data that might be seen as sensitive. And some environmental data does fit within this category I think we'll end up reflecting on this more as we move forward. But what we're hoping is this will help to provide this underlying framework where solutions can be developed to develop new data driven models and maybe digital twins. But underneath has to be that of academic robustness, data quality control and stewardship. We're already speaking to partners across the UK and I thought this audience might be interesting in some of those outcomes given the title of this talk. We've already got some important insights. And again for me one of the really interesting roles from AI in the environmental space is that connection to policy and practice. But what we're finding is we absolutely do need to strip this back a little bit. You need to think about this basic data science infrastructure. In many cases, you know active policy partners in the public and private sector do not know where environmental data is, or who controls that data. From a health perspective, it's seen far too removed from clinical practice and yet there's high interest from public health officials. And these insights alone help us think about how we deliver this supporting infrastructure to make sure environmental data is particularly useful for people sat in these outside of academia. Some really and again some really important insights quite depressing insights as well is this basic lack of access to latest versions of software outdated hardware, locked down versions of laptops we're talking about integrating AI informed solutions. Who looks after those solutions because obviously you need an appropriate software and hardware environment to enact those technologies were not quite there. Other insights of course we gain from yourselves and the very active research community that brings together environmental science and developments of in AI. We have the Alan Turing special interest group and I'd like to reflect on some of the insights we've gathered from that initiative. It's a great interest group will already have nearly 500 members members from across the UK from universities public sector and industry members. This interest group is based around four challenge areas from nature based solutions climate adoption resilience pathway to net zero and that linking digital infrastructure, and we're really keen to understand. How do we use this platform that brings everyone together to really facilitate the identity of the environmental data science because I keep talking about the two separate, but this should be a new area of recognition I think as active scientists. We had a face to face workshop, early last year, and we saw presentations again that showcase this really exciting work between the environmental science domain and AI from crop type classification to understanding urban topology from satellite imagery through to framing narratives around digital twins in in in really interesting environmental areas, but we also asked the community what it thought the opportunities for AI and data science were in reaching in particular net zero. Not only from a positive perspective but also from a negative perspective how might this negatively impact on our ability to reach net zero. And there were some really interesting insights, and this will lead to a commentary paper that's being put together at the moment. There were some obvious statements. We've talked about some of them here, reframing our methodological focus and old and new problems, developing these hybrid methods for new model developments. But of course, we then talked about the potential impacts of that. We've seen many narratives already talking about the carbon footprint of AI by itself. And perhaps they're not so obvious insights that came from this meeting which are really interesting is this conversation of, of having to make sure that we embed sustainability by design. And again this this makes us ask important questions about do what are our required training platforms do we have access to low power compute facilities. Do we need improved funding in this space. How do we make this attractive to people who have these emerging talents, you know this workforce expansion on the interface between environment and AI. And in competition with more better paid jobs effectively, how do we make this more of an attractive space for people to come on working. So there's a lot there but where does this leave us I guess, if I was being a little bit blunt, we could ask the question about whether or not do we value the slow and hard stuff. So low sounds derogatory I don't mean that I think it's more, more in the sense of it is complex to make sure that there are sustainable platforms to support what we want to do. So this includes databases calibration standards and metadata we talk about these all the time as scientists. There are potential software solutions to help, but we need the people to champion them we need the people to manage them as well. So do you think there is renewed vigor in combining mechanistic empirical model developments. A lot of the challenges that I felt as a researcher many years ago, in terms of computational bottlenecks. Of course the pathway to X scale and investments around that space perhaps will tackle those challenges. Do we always need machine learning to improve model developments perhaps not. There's already some conversations outside of the environmental sphere that pure machine learning representations about to have a limited shelf life. Whilst this is completely an irrelevant domain example. We had a really interesting talk from one of the Julia developers recently that showcasing programming environments where you can embed machine learning representations and process models in a single language solution, such that the machine model learns from the physics taking place. So again it's all about embedding more trust so there's some really exciting developments there. I think we're finding the training is key. And it's a heterogeneous issue. It's more than just downloading a software package throwing your data at it and getting a result. One thing that I found in the data science space is there just simply isn't enough information on negative results. I think this again touches on the sustainability issue. If we're to repeat over and over again processes that don't work can we publish those processes. And again there are various interesting programming software environments that could help sharing workflows containers may help in that space. We need to develop this attractive environment to build a skilled workforce. There of course is a big role for our relationship with big technology vendors, there might be brokers that can bridge the gap, the gap that we just simply don't have time to resource to work around. I don't think we can be responsible for all things, but we can start helping to raise this narrative more frequently that infrastructure and regulation is not a barrier to innovation. And again, we've seen really fantastic examples in the literature on the adoption of machine learning abroad AI, but the policy interface, we need to make sure it's robust. And I think the research life cycle could absolutely benefit from this more sustained infrastructure. The future I think is bright in the sense that you know we should look underneath our own research environment and look towards graduate skill sets. There was an interesting, obviously lots of interesting papers recently talking about the need to consider software in the same way we consider laboratory environments. We're already seeing data science undergraduate programs, the MSC that we have here, the environmental science pathway, which is one of many pathways of a data science MSC, there are over 5000 applicants to that entire MSC. So obviously there's a really exciting movement, it could be that in five or 10 years time, some of these challenges are being met. I thought it wouldn't be really appropriate if I didn't ask chat GPT what it's thought was in this space so I simply asked it a couple of nights ago. Does it think AI is going to drive the future environmental research. Obviously lots of really interesting data some of the themes that dropped out one thing I've highlighted at the bottom, which I think we all agree on is that you know it is important to remember that AI is a tool. It depends our expertise, but the true success will still depend on interdisciplinarity and these well informed policy decisions which I think supports hopefully everything we've talked through here. I'd like to end on this slide if I can because I think if you are interested in getting involved with NERC Digital Solutions Hub as a program you absolutely are welcome. We do have scientists of course but we do have some job opportunities. These are listed here. Just a very briefly comment on those. We're looking for a business analyst to map user requirements to the software environment so the architecture we're going to need. We're a research associate in environmental analytics as someone who has interest and experience in machine learning and statistics and developing tools that will sit on the hook to translate the data that will capture through this new architecture into insights that are of interest to our end users. And also some research software engineers. A lot of NERC's data is geospatial, not all of it, but we require some people who have expertise in GIS skills. They have a particular closing date and they'll start as soon as possible. But I think I'll stop here and I'm very happy to take any questions. And oh my video has stopped hasn't I apologize I didn't stop that. Okay Dave. Thank you very much. That's excellent talk and wide ranging loads of interesting stuff there. So, yeah, as Josie mentioned earlier please put questions in the Q&A. We've got time now for to go through a few things. There's a few points here one so I was interested. Thanks for the, for the advert and the time we'll have to forgive you that. Interesting to see one of those was on a user research analysis. So there's quite a different set of skills that quite some, you know, they might see, yeah, on normal basis so it's kind of interested in that. Some of the kind of solutions to the to these kind of complex, complex, well use of these models and complex different interdisciplinary approaches might need a different skill set in future so have you got any thoughts on, you know, what other types of things are in that area, what other types of skills we're starting to need ethics communication skills they sorts of things. Yeah. I guess what's going to happen in the same way that it's taken a while. And again thinking in, in the broader sense of someone who's developed models obviously there's been a huge movement around software sustainability and best practices and making sure that you develop. An appropriate workflow for sharing academic resource research so it's reproducible. I guess in the same way there's going to be expectations on practicing data scientists that are appropriate things that you bring to the fore with your developments that are, you know appropriate for ethics sharing and others. And also I think it's, maybe it's too much for individuals that could be supporting infrastructure to support that as well. I talked about some of those workflows and containers being particular area of interest for me at the moment. You know, open open source code and open data. If I'm being honest sometimes isn't enough. I mean we all share our code who can run a regional model on a HPC system within an hour. I can't. You know, maybe you can. But you know I think there's a whole ecosystem here when we talk about people who perhaps want to take research forward at the edge of academia at the policy interface. That requires a whole different set of considerations, really. Yeah, maybe maybe we need to, if we can't, if it's too much to expect to have all those skills in our teams we need to make use of wider, you know, ethics. They've always people, universities, things working on ethics and on communications, maybe they need to understand AI better to help us get those messages across. Absolutely. Absolutely. You know, I'm not saying that everyone needs to work at the interface we can't. There needs to be absolute clear focus on very hard scientific challenges. But at some point there's a delivery angle that's required. So still on that kind of aspects of skills you mentioned making it attractive to work in AI and environmental science so have you got any thoughts on how well both how we do that and then what maybe what to kind of research data science kind of intensive research team might look like or should need to look like the types of skills. I guess it's interesting isn't it. And in some ways I'm passing the book from a time scale perspective because I it's not clear to me the current narrative but it's more attractive because people pay more in industry or skills at the fundamental AI into interface is probably probably true and I haven't seen the data but it probably is true. Universities are struggling to recruit technical expertise in certain areas. And that being said, I do wonder if you know some of these through automation of some of these tools, whether or not that's going to rebalance itself. Such that in the same way that not myself but some of us will need know how to go into a lab and run a particular bit of your common scientific instrument are some of these data science tools because it's going to be part and parcel of a scientist toolbox such that from the undergraduate level upwards, it's standard practice. Some of them probably will be. You know, and I think 10 years ago, some of the tools that that we can now access within a couple of minutes within a Python environment would have been very hard to develop yourself. So I think I think it's a mixed bag this one. I don't have a particular answer to it. We of course have the benefit of solving worthwhile issues. I'm not saying that others don't in just but you know I think that will still remain an attractive position I think for people who come through the undergraduate system and want and want to make change, I guess. We do have a good we're working in a good area in the environment that people see it as being worthwhile. And I think yeah I think there's definitely more and more people coming through those master's courses and things with with some level of the skills and I think it's just what yeah that level that depth of understanding that's sometimes hard to get access to. Yeah. So there's a question around. The digital solutions and making no data more available. Question about this increasing amount of data from other data from other spheres so whilst the public data might be quite accessible, but industry data, maybe even harder and harder to access nowadays and how do we how do we get. Is that something we need to put effort into getting around. Yeah, it's an interesting one. The one the one good outcome from that program thus far is all the all the questions we're asking everyone's asking. So data accessibility has not been solved. Anywhere else in the broad sense. There is absolutely true value in bringing data that we wouldn't otherwise have access to to understand science we're interested in. This brings with it. The requirements around. Again, it. Ethics standards. But, you know, that sounds overly negative. I think we're moving into a space where through programs such as this one. We will be able to make those connections with partners outside of academia and even if the data wouldn't be openly accessible to all be able to delve into new insights through partnerships that otherwise wouldn't have been easy to construct. Having data availability actually helps frame human conversations and brings people together. So it's not just about getting the data available is actually bringing people together over the data. It's quite optimistic in that space, I think. And it's important. I'm quite passionate about making sure that academia is in from the center in that process, you know, bringing that academic robustness to those conversations because as I said before, if you take the city, smart city space, the blocks of solution driven technologies is staggering. And you know, I think sometimes if you take a look at local councils who are pressed to deliver an answer in very short funding timescales. You know, this is where this is where we can get involved. I think that data accessibility hopefully will open up lots of those conversations. That's good. Yeah. So there's a question around. You mentioned data standards and metadata. So what are the characteristics we need to have AI enabled data. Oh, that's a good question. Yeah. This is a huge area machine learning ready data sets, I guess. I think it's not just the data it's the software tools that develops right because you know understanding why a model will make a decision it's making sometimes leads us to certain machine learning architectures that you can interpret better than others. This is a whole active area of development in terms of deep learning that obviously I'm not well best place to comment on. There are existing ontologies metadata standards that the environmental space has adopted quite well, I would say. So there are already positive movements in this space I can't see there being in terms of metadata standards although someone correct me if I'm wrong. There are developments in this space there are standards we can adopt I think what we need to do is sit around the table and look at have a review of the current state of environmental data. And before we connect that data with new models that are delivered to implement change is just a basic review of what we're missing in terms of metadata standards provenance calibration standards, who owns the data or other things. Whatever models are developed we can look back at the data that was used. So I'm not sure I'm answering your question in particular I think I think it's just a huge area of work but hopefully through digital solutions again. We have the touring involved to kind of help us think through what do we mean by machine learning ready national data sets. And what does that require. So this is one for me then you're involved in monitoring data and approaches to monitoring I was kind of feel that. Well, I suppose there's monitoring for the sake of seeing what's going on in sort of real time but in terms of research when we do monitoring to kind of test models in the past they've been process models and maybe that's informed the way that we've decided how to do monitoring do you think we might evolve the way we do monitoring to suit the kind of testing of AI methods or even the makes or giving the data to drive AI methods do you think there's a difference there. I think there's going to be both to be honest I mean I think again the the simple example that I provided here on using, you know regional model to at least understand where the local data didn't sit our understanding from the data driven model. That was a great example to me about how provided you have access to those models. You can understand the broader sphere. I think it's going to be. I think the direction of travel seems to be. You know to reach to integrate the complexity we're now seeing through our monitoring platforms. I think the hybrid approach is going to have to be the one that we adopt. So, you know process driven models are always going to be important. I think to take them to the next step, moving away from the traditional parameterizations to embedding data science tools within those process models is where you see developments in non environmental areas. So particularly, for example the medical industry where there's been in some cases an outright ban on pure data driven models because you can't tease apart the explainability outcome. And I think we can adopt but again, do we rewrite our existing models what what are the appropriate tools we need to enable that there are some really some interesting developments in the US and Europe. And so the large scale kernels are being rewritten in these new languages that will enable people to bring the process models back in, but then also bring in the machine learning tools to run alongside. Sounds like sounds like a digital twin. And there was a question question on on on that what do we need. What do we need to do to keep growing the role of digital twinning. And where they're going to be useful. I think, if I'm being honest, I think we controversial I mean, I'd like to get the audience's opinion what does the digital twin of the earth actual mean. I mean, I, I think digital shadows we do. We understand the digital twin of a physical object we can change, but for many natural systems. I just, I'm not sure at the moment we understand we're properly boxing where a digital twin will be absolutely game changing and where it will mean we revert back to models. I'm really interested in that conversation and I don't have all the answers but but I do think it's being used to interchangeably at the moment in the environmental state. That's a controversial point to end on probably isn't it. Okay, here's one more possibly more controversial than how should future funding mechanisms from the research councils reflect changes in science and technology that we need. How can we how can what can be done to encourage and foster these areas in our science. I guess if I was looking at this from an operational perspective, what I'd like to see would be various business models that would reflect the aspirational need of the community. So what how much money would it take to fund people and processes in all areas, and then a grade down from that. From our current operation through to that completely aspirational and then really work out, you know what other people and skill sets we need. How am I answering this question I guess. I don't think we have enough headspace and time and resource from an infrastructure perspective. I think this needs to be drawn out more because ultimately this will benefit the blue skies science moving forward. But I think it does require that economic breakdown and the community community should take part in that I think. That's great answer David thanks very much I think that's looks like that's probably all we've got time for today so just yes thanks again on behalf of everybody. Thank David topping for your presentation and for the discussion. So just to remind everybody that we've recorded the session will make that available soon to watch again on the website and on YouTube. Just to remind you also yet to subscribe to that YouTube channel which will help us in the promotion of the series. So I think the link is going in the chat again, if I could go in the chat again. And the next so the next session, I think is in three weeks time it's from sorry gearing of the National Oceanographic Center and Rob Blackwell of CFAS on how AI can and is transforming monitoring of ocean biodiversity. So if you look forward to that one. Yeah make sure you're notivating your diaries and book a place with the zoom link on the website. Yeah, and thanks to everyone for for attending and thanks again to our speaker and hope to see you again. Thank you.