 So yeah, welcome everyone, thanks for joining us. My name is Matt Fry from UK Centre for Ecology and Hydrology, and I'm hosting this, which is the fifth webinar in this AI in environmental science series, which is supported by NERC and the Constructing and Digital Environment Program. So this is a program aiming to develop the digitally-enabled environment benefiting researchers, policymakers, businesses, communities and individuals alike. And it's been running for a number of years now with the aim of envisaging and developing approaches to creating the future digital environment, exploiting advances in technology and increasing diverse datasets to improve, sorry, exploiting advances in technology and increasingly diverse datasets to improve our understanding and management of the environment. And it's done this through a number of funding, a number of projects and a range of other activities and building a community in the area of digital environment. Running events, successful conference last year and just to remind everyone that there's NERC's digital gathering 2023 is in July, the 10th to 11th at the British Antarctic Survey offices in Cambridge, I'm gonna post a link in the chat now. It's still open for registrations and for submission of abstracts till the 30th of June. So please do take a look and come along. So yeah, this webinar series been fantastic range of different subject areas from environmental sensors to data management, legal and ethical aspects of digital environment and decision-making, as well as showcasing some of the projects in the digital environment program. And this is the seventh series of webinars. There's a fantastic resource of webinars on YouTube now. If you go to the digital environment page, you'll find all the links through to all the different series organized by those series by subject matter. And this seventh series is considering the role and the opportunity as well as some of the pitfalls in the use of AI environmental science. And the format is to invite a presentation from leading experts in the field followed by a chance for questions and answers. And I'm gonna please invite the audience now to subscribe to the YouTube channel if you haven't yet done already to see the rest of the talks. So the link for that is gonna go in the chat as well. So yeah, so AI tools are enabling new analytical value to be delivered from existing sources of data as well as providing powerful tools for gathering new data. And this webinar series is gonna cover activities across this area. Very excited today to say today's presentation is a seminar from Rosella Arcucci from Imperial College London talking about data learning which is integrating data assimilation and machine learning for reliable AI models. So Rosella is a senior lecturer in data science and machine learning at Imperial College London. She's an elected member of the World Meteorological Organization, an elected speaker of the AI network of excellence at Imperial College where she's represents more than 270 academics working on AI. She's been with the Data Science Institute at Imperial College since 2017 where she created and she leads the Data Assimilation and Machine Learning or Data Learning Group. Thank you very much Matt for the introduction. First of all, welcome everybody. Thanks for joining us today for this webinar. Today I'm going to give an overview about what we mean with data learning, what we mean about this integration of data assimilation and machine learning and especially I will focus on a few aspects including why we want to do that, what we do, what types of applications we are working on especially in the context of environmental science given the topic of the webinar and yeah, I will show you quite a few applications but the talk will be high level. Happy to have follow up discussions with the one that want to go deeper in the details. So the content of these of the next few slides will be essentially I will give an overview about these data learning in the context of AI as for the title, we do that for reliable AI models. So it's good to just to be on the same page mentioning what we mean with that. As I said, I will show you a few data learning models and I will give you examples of real-world applications. Just to be sure we are on the same page over the next slides, I will focus quite a lot on this balance between efficiency and accuracy of the models we are developing. When I say efficiency, I mean literally computational costs. Efficiency can mean different things in different contexts. In this context, we literally mean computational costs about the accuracy, also for the accuracy, you can have different definitions in different contexts. Here, we mean that our models want to be as much as possible close to the real-world scenario. And so reliable AI models, but what is an AI model for us? I'm sure usually when I give this kind of talks in person, finally now in person, usually I have people to raise their hand when they have already an idea about what is AI, but let's say that virtually, you already have, anyway, an idea we are in the era of artificial intelligence. So just to be sure that we are on the same page, let's say. For us, yeah, there are a few people raised hands. AI is essentially the ability of a digital computer or a computer control robot to perform tasks that usually will be associated to an intelligent being. And so in this context, over the next life, there are different, as Matt was saying in the introduction, I work now as AI speaker at Imperial, so there are lots of people working on AI from different perspectives over the next few slides when our AI models that I will show you are most of them based on something called the artificial neural network. Just to be sure that why we say artificial neural networks, essentially we are trying to replicate in vitro, let's say with models what we have in our brain. So this is the basic of our artificial intelligence models. And as in our brain, we have our biological neurons with connectors synapses and the neurons are connected one to another. The same way in our models, we have these neurons and we have these inputs and outputs and the neurons are connected among them. So the difference is that obviously in our brain, there are chemical reactions in our artificial neurons. That is something called processing unit. I have to say I'm a mathematician. So I can be sometimes really boring. I can see boring because as I am a mathematician, I can say that without any complaint from our other colleagues. I will not give you any details about the math behind the processing units, but it's just for people that are not completely familiar with what is an artificial neural network. Behind this processing unit, there is lots of math. There are function, cost function, loss functions to minimize and lots of math. But as I said, this talk will be quite high level. It's for you just to show why you want to do that and what you can do with that. Then about how we can have like follow up discussions. And so as for us, for our neurons in our brain, essentially, especially when we are children, when we are child, we have like a lot of information coming in our brain from the stuff that we see, the stuff that we smell, we touch, et cetera. So lots of information are coming to our brain, developing the networks of our neurons in the same way artificially we need data for that. And so when I say data and you will see over the next slides, I mean literally almost everything. This can be like information, factors, new medical data, data from social media. So any source of information that you can imagine can be actually used for these types of models. And I have to say that we are very lucky because we are in the era of the data. And just to give you an overview about the possibilities considering different sources of data, different sources of information, the data I'm showing over the next slides are coming from computational modeling, for example, for weather forecasts or computational model for more environmental forecasts. For example, you can see here, it is like Elephant and Castle in London, data from social media, data from sensors, satellites, so remote sensing, and also numerical data coming from, for example, financial markets. I will show you examples of some of this. So why we said, why we want to do that? What is the main motivation behind the development of these models? The reason is because we are like in the so-called era of digital twins. So, essentially, what is a digital twin is literally digital version of something real. So you can do that for healthcare. For example, we have in one of our UKRI projects, the modeling of lands to study the impact of the pollution. You can do that for energy applications. You can do that for any real-world application that can give you enough sources of information to be digitalized. So when I say that, and then people feel, oh, yeah, I understand digital twins. But actually, what is that? So lots of people are talking about digital twins today. But the key question, when you go to the core of the point and you want to develop a digital twin, how can you do that? What is that? And what are the good points that you can have? What are the problems you have to face in developing this kind of stuff? So what is actually a digital twin for us, for example? There are lots of people talking about digital twins. Also in my group, we are working on digital twins. And obviously the interest is that when you have something completely digitalized, then you can simulate scenarios. So you can assess the risk and stuff like that. I will show you few applications of this. But what is a digital twin in the context of AI and data learning, machine learning, is essentially something that is a model, digital model, data-driven, not completely data-driven most of the time. There is also some physics. And so we, most of the time, we talk about physics-informed machine learning models. At all time, you want to be sure that your model is realistic and especially is reliable. So you have to include that all the time uncertainty quantification and especially the minutiation of this uncertainty. And also you want to work in the context of explainable models. So in a context of explainable AI. So especially most of the people we collaborate with as data scientists, the people are coming from, for example, there are physics, biologies. You cannot say this is like a black box, it works. They want to understand what is behind that. So we try to provide explanation about what is behind our models. And when you try to work on developing these kinds of models, most of the time we can say is not actually straightforward. Because when you have data, even if you have meaningful data, it's already a good step. Sometimes you want to answer a question from the real world. And then you don't have meaningful data for answering that question, but assuming that you have that, then you have like most of the time, few problems to face. Sometimes data are not enough or data are completely unstructured. I'm sure you have seen lots of beautiful models online showing how much learning can work with images of cats and dogs. And, but then you work in environmental science and data are coming from sensors. So it's parts unstructured and you cannot apply the same methodology. So not the problem. Data most of the time in these types of real world application are too big. You cannot run that even sometimes on a good machine, you need supercomputers. And especially these types of data are updated at all time. So you have new information coming from, for example, satellite sensors at all time. And you want to be sure that your model can be fine tuned properly. And so this is the focus of this talk. I will show you how to face some of these problems. If one of these problems is coming and even if you have meaningful data what is the strategy that you can implement? And so, as we said, we have these data, few problems, most related to dimensionality constraint, the noise in the data and so on. So this is the reason why when we work with martial learning we don't only work with martial learning models with deep learning models especially but we couple, we integrate data science with that. So implementing good data science models before or with your deep learning martial learning models is like the missing piece. In our case most of the time our data science technology like the one that we use most of the time is data simulation. And this is the reason why we call our models data learning. And so data learning models are completely general. I will show you examples of these. We apply that to social science, to medicine, lots of geoscience in a lot of different contexts in different projects in collaboration with other universities. And I will show you some of these applications. I will show you what you can do if you are also interested in how you can implement these models. I'm giving you this paper like you can suggesting you this paper where we are, this is like a data learning 1.0. Now we are publishing data learning 2.0. Essentially is an overview about the possibilities and the different models that you can apply to real world scenarios depending on the type of questions you want to answer. And so in the paper you will see lots of different models applied to lots of different applications so somebody can get lost because you have like data simulation with the reduce of the models and then with the principle components analysis for example with neural networks with the out encoders encoder decoders Gaussian processes and data simulation common filter and convolutional neural network so people feel like, okay, lots of stuff. But what is the key point? What is the real expertise of a data scientist? When somebody goes, somebody from, for example a company, corporate or somebody who is not an expert of data science going to the scientist saying I have this problem, can you help me? The key point is understanding what is the model you need to apply? And there are different ways obviously everybody is working on their own approach like they are experts of something they try to apply that model to the specific real world application. I can share with you with what is my own way to decide the first direction to follow when a real question from a real world comes to be answered. So essentially the way they compose let's say the different models to choose what is the best one in that real world application specific question for the real world applications is essentially the key point the one of the key point is this balance accuracy efficiency and this is coming from the application to give you an example if you want to work on healthcare modeling and I gave you just like an example we work on lungs modeling like detection of disease on lungs modeling do you care more about accuracy or do you care more about efficiency? So the point is that if you have your prediction in six hours it's not a big deal but you want to be really accurate you want to understand if you are predicting the disease with a high level of accuracy you want to be sure about that but on the other side if you are talking about storms, hurricane, wildfires I will show you something about the wildfires we develop in collaboration with the Leverung Center for wildfires do you mind if the prediction of the five front is one meter more towards north one meter less towards south or stuff like that or if you really want to understand where this is going now what is the direction then one meter can be an approximation who cares? You want to be accurate of course but not that level of accuracy is important for you what is really important is efficiency and so application oriented on one side other side this data oriented depending on the availability of the data you have so if you have historical data and you have good historical data and you want to clean you can clean it and use it or you have like more an online problem when I say online I mean that you have data coming at all time and you need to integrate this data in your system because new data in your case means completely different scenarios for example and this is what happened with weather forecasts for example you want to update the systems and so just to give you three examples of the opportunities as I said this will be very high level just showing you the opportunities then I'm super happy to have follow-up conversation you can always send me an email and we can have a chat about the deep models that are behind this application so I was optimal data selection this is key when this happens sometimes to us people come and they say okay can you help us answering this question we have this data and you look at the data and you feel like why this data instead of other data why you didn't come before collecting the data is sometimes it's better if we can suggest you what data you can collect and so I'm giving you an example of application in environmental science imagine this has been quite useful for us during COVID for understanding the air quality in indoor environment so you want to put some you want to understand what is the air quality in an indoor environment and then you have data from sensors and you place these sensors just randomly and then you see that actually if you put the sensors only let's say randomly you have a certain accuracy for example in that case it was not too bad 0.17 of accuracy but then if you use these technologies based on Gaussian processes mutual information and data simulation you can actually have a good reconstruction of what is the air quality within the room and so the error MSE is here means per error the error goes down from 0.17 to 0.0005 and the key point here is that I'm mentioning here in the environment but this is as I said very general so you can do that for example for outdoor environments and we have done that in a project the magic project it was like a big project EPSSC and also for outdoor environment depending on where you place the sensors because in the streets you have these natural ventilation likely so you have wings so depending on where you place the sensor you can have a realistic measure of what is the air quality in that place or a completely unrealistic because it's just like a particular point particular spot and so what is key of these types of technology so the collection of the data in the proper places is that you can have with just few dates a few points, few data a very good representativeness of the entire scenario this is quite important in the context of smart grids obviously we are now in the era of these smart buildings we wish to have smart buildings telling us open the window or close the window depending on what is the air quality inside and outside for example or the temperature in door and these types of suggestions but if you want to have like a clear vision of what is indoor and what is outdoor to have these smart systems then you need to have good data good collection of data and so these became key in these types of applications other types of problems is when you say okay I have my own idea about what is the physics behind my model but I don't have a clear idea of what is the value of the parameters and so for example just to give you a completely different scenario so that you don't feel like this application these models can be applied only to specific applications one of the applications we consider was the bitcoins in general the cryptocurrency market because it's so volatile and crazy that you cannot really imagine to have a clear vision about the physics behind so we imagine to have like a description of an economic model and then we try to learn these parameters like describing the evolution of how the parameters were running in time and so using these types of models these data learning models that I'm sharing in these papers essentially what you can do you can learn the values of these parameters of the available data in time in real time and then make prediction of a few time steps and then re-update and re-update and re-update and when you do that every few time steps you have a really good than accuracy in terms of short-term forecasting and luckily these models are general enough because when the pandemic actually exploded and at the very beginning it was so I was at that moment a research fellow in the data sciences and so one of my colleague, she's from Juan she's now a lecturer at another university but and so we were like focusing on this scenario so what was happening in Juan and we did have like the power of our technologies we didn't have like anything else let's say just few data and so just with the ability to work with the lack of information you can really try to learn something but we have been able to present first results in a Royal Society in February 2020 and then if you remember in March that was declared to be a pandemic we were focusing on one in that specific moment and what essentially we applied the same technologies we developed for the cryptocurrency market and we use that for the epidemiologic model exactly the same we had like another model with the other different parameters the transmissions parameters and we have few data coming from hospitals coming from the government and so we were learning these parameters trying to make like the right prediction and of course for the cryptocurrency market you can imagine why you want to do that but for the pandemic the key point of building these digital twins for the scenario for the pandemic was that having digital twins can help you then simulating different scenarios so for example simulating mitigation effect if you remember the two meter rules or simulating suppression effect lockdown so no contacts at all and understanding like different having a good digital twin for that was helping us simulating these types of scenarios and so yeah as you can see we had like quite a good then adjustment in terms of what the model was predicting and what was happening then in real life still in the context of data and when you want to improve accuracy and stuff like that there is this problem of data augmentation so essentially most of the time not most of the time in some applications you don't have enough data because collecting data can be very expensive or expensive can be in terms of computational time or expensive in terms of money because you don't have like for example data coming from the laboratory for drug discovery they cannot really reduce lots of data because anytime they do experiments it's costy and stuff like that so you want to work on data augmentation increase the data you have simulating syntactic data now I'm not sure you know this website this person doesn't exist so we don't have time now to navigate that and play together but if you just Google it I'm sure you can just Google it and anytime you refresh the page you see a different person different person completely artificially simulated so for example this person doesn't exist this person doesn't exist this doesn't exist some of them you will never say that they don't exist but actually these people are completely artificially generated so you say okay but environmental science so why this is important to you other people now imagine you're working with completely different application waves energy converters in one of my projects we work on waves energy converters and simulations from computational fluid dynamics simulations of the waves energy converters and what happens and how much energy they are producing depending on different waves etc is so costy in terms of computational science the computational run is that we try to answer this question can we generate synthetic data 3D synthetic data with a physics meaning because this is what we have the application is that so this is that 2D slides but this is a 3D simulation of waves energy converters so this is too costy so we were checking if it was a possibility and I'm very happy to say that yes you can obviously it's not too straightforward but happy to share with you I will at the end of the presentation I will share with you the codes our GitHub so you can have access to all this and so the point was in that case was that for this simulation they needed like two weeks to run that when you do that completely synthetic you have you need only 0.05 seconds for 100 samples for the within the reduced space in 3.48 seconds when you go to the real actually physical space let's say the physics space that our physics will expect to see so 3.47 seconds is like compared to two weeks is a great achievement obviously we have to be careful about the checking of this data just to be sure that this data had like a physics meaning I have to say that everything we are doing we are always in a multidisciplinary multidisciplinary group so that we have any time like the feedback from our colleagues just to be sure that is not like something digital twins for something unrealistic but we want to be sure that this can be used by the people that have to use that and so when you have data and you want to update that then in other examples data simulation what is data simulation if you are not familiar with that for example here on the bottom you have simulations of a car accelerating in a junction for example and then at the top you have the reconstruction from a sensor so the the completion of the dynamic structure is saying this is what is the happening the sensor is saying is this is not so you want to integrate this information in your system and so you can do that with these so-called data simulation and you can do that with standard approach I'm showing you here or with something more data driven like this one and what is the difference between a standard approach and something completely like based on on machine learning is that you can be faster keeping the same accuracy with machine learning you can be like one order of magnitude faster that can be key when you want to update information in real time and so one real-world application of that is something that we do with the use of satellites and wildfires so we have our wildfire simulation but then to adjust the fire front and what's going on we keep ingesting information from data coming from satellites so these as you can imagine this must happen in almost real time so this must be very, very quick if not seconds, even milliseconds sometimes data simulation for just simulations but also sometimes we have like data fusion as we said at the beginning we are in the era of data so we have data coming from lots of different sources so what is data fusion in this scenario just to keep like to be on track with the with the wildfire application one question that we were answering is with these wildfire applications one of our problem is that for forecasting is okay but for the now casting when you want to really see understand if a wildfire is happening now satellites are like not great in these because actually we have the time that the satellites is there takes the image and then send the image to our per unit and we check it then there is a time lag of at least three hours but three hours for a wildfire can be quite a lot so we were wondering can we use other sources of information if it is to understand if something is happening then we adjust that with satellites so the answer was let's try with social media and this is something that we have done so essentially we have this system we have built this system that essentially is a checking tweet at all the time and filtering of course and checking the reliability of the tweet except like that and then if something is happening somewhere then we the you can simulate what is going on because we tweet you also have the geolocation or you are able to reconstruct the geolocation and then combining this data with data from satellites can help us reducing this time lag quite a lot and so this is something that we have done for example you can see here what we can really see predict just with the tweet and then we adjust that after but I mean square error for the geolocation of these types of value is like is small because actually of course when you are tweeting something about Wi-Fi you are not in the center of the mission hopefully otherwise you are not typing on your phone and you are maybe running but if you are far so this gives us at least the region that we have to check with the satellite and so this was for let's say data fusion and data fusion of two different satellites two different social media what can be for example data coming from Twitter data coming from Reddit so you can still apply these technologies just to give you like an overview and if you are thinking yeah but my technology my problem has slightly different data this is just to say you maybe you can still use that so you may want to take a look and so still merging of fusion of data it was something during the pandemic we were trying to understand what happens when somebody is sneezing or coughing but we cannot really simulate all types of nose and mouth right? We are all completely different and in a simulation our nose is literally the inlet of the injection in the in the air so what we were doing is building this system that essentially gives us a general simulation but then this can be adjust depending on the different shapes of the nose or the mouth and stuff like that and so this was like for more data fusion so when you have then all these like work done with data and sometimes sometimes you want to build something called surrogate models so these models are able to simulate them the digital screen so let me give you an example this is for example the campus of Imperial College Department of Kensington so you can see these round here is the Albert Hall this is then Hyde Park and this is what happens in the morning when all the bars and the cafes and restaurants are opening so the pollution is like starting so beautiful you can have a good digital screen with that however the point was that to have these simulations you need like crazy like amount of execution time so it's really slow so we were trying to build then these models completely data driven that are able to simulate what the computational flow of the dynamic is able to do but in real time just to be careful about that because at the very beginning years ago when we started our models were able to simulate that but then we start excluding after few few time steps but then we are started we have started implementing the concept of adversarial training and so our surrogate models are now more stable and reliable just to be sure just to mention that behind these models there is no like there are no beautiful images of the scenario but actually there are completely unstructured reads 3D unstructured so this has been one of the main challenges for us because we won't be able to apply most of the available deep learning machine learning models but we had to build our own versions and so something more sophisticated that just to share with you only because I'm sharing with you all the codes for that that can give you even like higher and longer stability when you run your simulation with your data obviously to train this model you need time series and if the model is 2-2-2 big let's say that you can even on your computer you cannot do that you can still develop something called domain decomposition so you can decompose the data properly and run that on big supercomputers and this is what we have done for example with the Barcelona supercomputing centers this is an example of the impact of these types of models something that we really care about obviously we are working for net zero towards net zero right so the point is that sometimes you try to simulate these scenarios to reduce the emissions and then your simulation is so heavy that you are consuming lots of power and so you are actually emitting a lot so something that we are trying to take into account when we develop models is just to be sure that our models are also efficient on that aspect we want to be sure that we are not running models all the time in emitting so this is like one important aspect for us I'm going towards the end but just to because at the very beginning as I told you that this is completely general and just to show you that this is completely general I'm sharing here like another application this is essentially oil and water in a pipe and the completion of the dynamic simulation for these simulation was taking 40 hours and our data learning model is taking for the same only one minute and we have explored like very macro like Wi-Fi and then something in the middle like now engineering and pollution but can we do that also for micro for example if I want to use these types of technologies data learning technologies for droplet coalescence droplet conformations for example silver nanoparticles and we work within another of the projects premiere within on that so you can still do that you can apply these types of models from like macro to micro and so obviously this will be impossible for me without the help of all the postdocs and the PhD students working with my group so the data learning group and I really always want to say thank you to all of them for the and to see us and especially couple of stuff that I want to share with you we also have like we can eat things and we have a YouTube channel with with these talks from people invited speakers coming from other universities or companies so in case you are interested please take a look if you are specifically interested in the topic then there is a list of these talks and there is our link on YouTube channel also if you are interested in these integration data simulation martial learning dynamical system there is a community behind that is like increasing international community and we have annual meeting now this year will be in Prague I'm sharing this with you because if you are interested that you can check the proceedings because they all submit papers for that so if you are interested in understanding what is the state of the art happy to share and as I said we have also a GitHub so all the codes and that we have developed for these applications are on our GitHub so happy to share and with these I just want to say thank you very much again and I'm happy to answer any question thank you very much Rosella that's fascinating excellent talk yeah we've please have already posted your questions in the Q&A and we've got a few there ready so I'll just read them out if that's okay Rosella so firstly from Matthew Smith do you equate digital twin with data constrained model and if not what are the key differences so for me I can say for me a digital twin is a good digital twin and efficient and effective digital twin is a combination of models and data so this must be like is not only data driven is not only physics driven but an effective digital twin model comes from the the fusion of the two and on on that point then when we talk about data assimilation methods some of those it sounded like you were assimilating to sort of statistical or data driven models some of them are assimilating to process based models within with the process based models which I guess a lot of people in environmental science are interested in what does the assimilation what are you adjusting is it model states at any particular time or changing model parameters and things like that can you Yes so something that I keep in this presentation is essentially the why you do data assimilation most of the time you do that because when you adjust the initial condition or the boundary condition or the parameters of your model you can then have like the impact on the output right so usually when we do this assimilation we fix on one time step so we arrive to that time step so we just new formation depending on the specific for example for the air pollution propagation so the air flow it was the initial condition and the boundary conditions so in this software called fluidity but it's the same as open form or other software essentially you have your checkpoint so you change your checkpoint because you learn something external you change the checkpoint and then you continue with your simulation this can be also done for parameters and this is what we have done for example for the epidemiologic model adjusting the parameters depending on what you can learn initial condition boundary condition or parameters okay thanks very much so I've got another one related to the wildfire example and then tweeting if it says if those tweets were kind of archived and not live not live you're actually only getting the historical data so and that would be available even through the historical satellite data so why not use the satellite data rather than the tweets which might have bias in them so I guess it's can you get the tweets in real very real time yeah you can you can have the tweets in real time yeah and I have to say that the very beginning when we started the study and the PhD student working with me was monitoring also my Twitter account I was like so when if you have tweet Twitter 99.9% of probability your tweets are available so with the API you can monitor what people are talking about and if they are talking about wildfire or fire or something that obviously with a filter because if somebody writes I'm on fire doesn't mean that it is a wildfire but with proper filters you can monitor that in real time yes yeah we had the same with a drought project looking at text and finding lots of gold droughts being the most football droughts somebody's asking again about the digital twin so and I guess the same so it sounds like your definition might just be that a digital twin is just a model of something or is it them is it more complicated than that isn't so is the model plus an adjustment from the data so real time part is very is in real time yeah okay that's great thanks um digging into a bit more detail somebody Giovanni says that's a fascinating talk thank you could you please further elaborate on how noise in experimental data is managed for example in air quality applications is it filtered or modeled and I would expect forecasting or now casting performance should be significantly affected by noise levels oh absolutely so essentially something that is really important and maybe I forgot to mention in this talk is that lots of people especially in the martial learning community they talk about the ground truth so we have ground truth you can do that when you actually have images or something like photos of something but in environmental science in geoscience in general we don't have ground truth we don't trust data we don't trust data coming from models we don't trust data coming from sensors from satellites so what is the way we manage that with these data simulation models you have the possibility with the pre-analysis of the data to build some weights called the cover error covariance matrices that you actually use to balance your data so to balance to balance how much you trust data what I mean is that if you trust a lot some data these weights are like small if you trust less this data the weights are big so depending on the on the the trust you have in this data given deep reanalysis on the quality of the data then you can ingest this information in your model and so when you ingest data you already know that the data is noisy and you cannot do anything about that the real world is the real world right but at least you can balance how much you want to take into account coming from this data that's great okay and just so some of the visualizations you showed like the street level computational fluid dynamics are very kind of interesting do you think you could use those kind of visualizations to actually convey the uncertainties in the model alongside the states of the model oh yes actually this is something that we do so we have like the simulation and then on the side we also have like the same simulation but just showing the error like so that we better can we better visualize where we are making bigger mistakes let's say though this is very important yeah you can do that you can do that with something called parallel sounds great okay somebody Williams asked said again great talk could you share your views and experiences of the best digital twin software is it is that a the real question so I wish to say there is a there is a best digital twin software but I think he's strongly he's strongly data I think application oriented so I've been working with a lot of people they were using I don't know fluidity out there we're using open form or people like coming from operational centers for example we are working now with the Eremedi-Torino Center on climate change they have their own software and so it's like this is strongly application oriented I would say but if you are if you are just starting for example I think open form in environmental science these types of simulation maybe open form can be easy and there's a related question from Matthew Smith have the tools you use for encoding models changed in recent years is there still lots of kind of fault transparency plus plus or using more model languages and frameworks like Julia and TensorFlow so very very good question actually there is a you know um we are I sometimes I say we are the bridging community so we try to bridge people working on these applications in years and they have very solid software developing Fortran in C plus plus and there is no way to translate that and then there is the other community that is working on like even new languages so we struggle doing that we have seen like now there are more they say tools that can help the integration but still there is not like the perfect effective way to integrate the software from Fortran C plus plus and yeah I imagine you've got lots of existing models that you need to build new data assimilation approaches in so there's a kind of related question from Simon about saying building crystals when seems a lot more work in terms of the infrastructure required to support data assimilation and real-time modeling so in similar work I've found a big knowledge gap between the domain expert modelers and the people who work on the infrastructure and how have you effectively bridged that gap uh so straightforward learning new vocabularies the the I can I can say that the main challenge is talking to people I'm not joking because sometimes we we use some technologies they use completely different terminologies to mean the same stuff so the first um and I've I've been working now with lots of different people from environment about also from uh doctors for example right and and so anytime all the time is learning a new vocabulary is painful but this is fundamental and then yeah the feedback from there at all the time and it's not like having meetings once in a while but actually actively involve them in the discussions that's yeah interesting to hear so we've got there's a project a NERC project we've uh working on for an information management framework for digital twins and those vocabulary aspects are very important to make people understand I can imagine that's great well I think that's all we've got time for today thanks very much thank you again um Rosella for for your presentation and the discussion and to remind everybody else that we've got we've recorded the session we'll make it available soon to watch again on the website and on youtube um and remind people again to subscribe to the youtube channel and i'll put the link or the link's going to go in the chat again um just to say that the next webinar is going to be on find it very briefly on the Friday the 14th of July by Professor Ian Stiles from Queens University Belfast on AI for Biological Imaging and Sensing so yeah thanks again um Rosella thanks thank you and uh thanks everybody