 Rhaid i chi'n dweud o'r interddu practices, ac i'n ddim yn gweithio ddim yn ymgyrch ffawr trafodaethau llwyddiant eich wath o bethau a wneud. Ymgyrch chi'n dweud i wneud ar y tîm, o'r awsfer o gymhreithau global ymgyrch ymgyrch, ac mae'n ddysgfa enw yn gweithioeles ar y hufnwau tîm gweld. Dwi'n gwybod i'n ddweud yw'r ddyf yn uchido'r data sydd wedi'i gwybod mewn hollwnod o'r ddweud i'r ddatblygu. Mae'n phobl o fyfyn orio ddwyig. Mae'n hollw'n gwybod i'r ddweud o'n dweud yn ym Gloolf Trust yma, yn yr un o drefnwyr i'r annou. Mae'n rhan o'r fawr i'r wlad, yn ôl i'r fawr yn gallu ddweud. Nid ydych chi'n gwybod bod hefyd byw ychydig ychydig yw i'n ddefnyddio awr o'r gand o'r gweld. I'll summarise how I think we need to make progress and we need resources for a continual process of expansion, evaluation and, importantly, revision. We need to make sure that we don't treat these. These things as a one one shot only we need to make sure that we, we can learn from from our experiences and revisit. We visit everything and we need to make that easy to do. And what we really need to do is to break open the data system to allow wider participation, and I'll briefly consider who would benefit from doing this. So, first of all my example historical global surface temperature change. So many of you will be familiar with the most recent intergovernmental panel on climate change, which concluded was able to include it conclude this this time for the first time that that it really is us. We are causing the warming, and we can see in the plot on on the left that this, this data, this recent warming is unprecedented over thousands of years, and in on in the right in the sort of period that I'm most involved with in the last tens of decades we can see that it's clear that we need to include human. Human effects in the climate models, the sort of brown colour to make them match the observations which we see in black and if we don't include those. Human effects include only natural effects we get. We don't see the recent warming so so on that you know on on that standard that's great we can use the data to conclude that humans are affecting climate. So focusing in on the observations now. So these two plots are the same data, but then anomalies relative to different periods which emphasises differences in different in different periods, but essentially the data the same these are from a recent paper describing the had crew T five global temperature data set. So we can see we've got uncertainty shown by the gray bands and we can also see differences between the different data sets, which is larger in some periods than in others. So focusing in again, we have the sea surface temperature contribution so this is the marine contribution to that. That global time series, and this is from the had SST for data set. So if we look at the if we look at the time series on the top left, we can see that the unadjusted data shown in in red is a lot cooler in the earlier period so we've got some some substantial bias adjustments that we need to make to the data. So when we've done that, you know the three different estimates from three different organizations so we have the Met Office, Noah, and the Japan meteorological agency contributing data sets of this this sort of length to the reduce data sets of this length. We can see that we have, you know, broad agreement but we've also got some differences and that's shown regionally and actually plotting differences of the different data sets in some of the other panels so you can see that, you know, we don't quite understand what's going on at every level, and we can see that actually some of the differences extend beyond the gray uncertainty ranges. So, you know, there's there's more work to do some of this is due to biases in the data, some of this is due to data sparseness. But we can see that even at the global scale, we've got some work to do to try and improve the data. So going back just looking at the global, the global data sets again, and focusing in on the, the legend, we can see that actually we've only got two contributors to the marine contributions to these data sets in red, I've outlined those global data sets that use data from the Met Office, and in blue those from NOAA. So we can see we've got two major, major contributors here in the UK, the Met Office, and in the US, NOAA. So if you contrast this with, this is from the appendix of the recent IPCC report, and this shows this is a table summarising all the institutes that have contributed to the historical simulations and CMIP6, which is the model archive that underpins the intergovernmental panel on climate change assessments. So we can see there's a lot of different models, and you'll see that in the ensemble spread, for example, that's presented in the analysis, and I counted 40 institutions and 60 models contributing to the IPCC. So the question this raises to me is why have we got 20 times more institutes contributing coupled climate runs than we have SST data sets. And I would argue that this is largely because it's difficult to use historical data. So this is a paper, a figure from a paper we contributed to Ocean Obs conference in 2019. I'm not really wanting to focus on the detail here, but you'll see, you know, there's a lot of different, lots of different things you need to consider. But what's also important is we've got lots of feedbacks, we've got lots of arrows going from the end of the process back to the start of the process. So we've, we're trying to build in this process of re-evaluation repeating improving. So on the top, the top box I've highlighted in red is all about choosing your data. I mean this might be very simple, you might be using a single data source, or you might have to spend a lot of effort gathering together many different data sources evaluating them, revisiting that choice, for example. The second, the middle red box is all about understanding and characterising the data. Once you've selected it, you need to take a look at it, you need to understand what you've got, you need to estimate uncertainty. You need to see if there's any problems, fix them if you, if you can. You need to make an applied bias adjustment and evaluate consistency, and only then can you move on to constructing the product. And when I'm talking about product, I mean something like had SST for a gridded monthly. So gridded had SST for is five degree latitude, longitude boxes, and monthly, monthly values. Once you've done all of these steps, you can start thinking about smoothing, gap filling, estimating the uncertainty in the gridded fields and then evaluating the result. And then you can feed the data or the products to users. And you can actually use some, for example, if your data help improve reanalyses, you might be able to use that reanalyses in your, your quality to provide limits for your quality control. So there's some very strong and virtuous feedbacks here. So, if you're constructing a gridded dataset, I've put here a representation of a data volume that you're interested in. So this will be a space and time volume and you'll have some observations in that. And the variability you see in those observations is going to be a mixture of real variability. And that might be at different scales, it might be systematic, it might be, it might be random, you might understand it very well, you might have very little information about it. And then on top of that, we'll have some variability in the data due to data artifacts. And I've listed some of the things that you might worry about under data artifacts. So the challenge, or one challenge of constructing datasets is to look at the variability, the differences in the data you have, and try and understand whether they're due to real variability, or to data artifacts, and then how you're going to handle that in your analysis process. So what about the data itself? I mean, actually getting hold of data is a balance between accessibility versus complexity. So on the left, I've shown a screenshot from the Copernicus climate data store. So here you can simply click on a variable, whether or not you want data that has passed quality control, select a year, and a month, and then download your data. But this will be in a very simplified format. So underpinning the Copernicus climate data store is some work we've been doing on a common data model. So there's a flow chart there that shows the complicated structure of this common data model, but it allows you to represent some very complex data structures in a consistent way. And I will show you later in the talk why it's important that we retain this complexity. So in the marine domain, we've been very lucky that we've had an archive that is collected together, all the data or a lot of data. It aims to be comprehensive and converted it into a common format. So we've translated various coded and coded information and information in, for example, temperatures in Fahrenheit to give us some, for example. And then they've converted all of that to a standard and again very complicated format. There are lots actually in ICO ads. I mean, if you look at the indicators in there, there's about 300 different sources of data, and these will have come from very many different, very many different sources. A lot of the older data will have come from logbooks and originally these will have been processed just by hand compiling information to give you charts and climatologies of various variables. Later on, it was recognized that to make the data more useful it needed to be in digital form. And the first way this was done was actually using punch cards. So I've got some examples of punch cards so you can see there's some compromises and choices that have to be made going from the logbook to the punch cards. The punch cards are much easier to store. You can see in the picture of the entrance hall to one of the NOAA buildings. I mean, this is full of cabinets, full of punch cards. So while this is great in that you can actually read this data, machines can read this data, we've still got a massive data storage problem. Moving on, we then have tapes. Many of us, or some, I started my career when tapes were used to exchange data. But even when you've converted your data from punch cards on to tapes, you've still got a lot of tapes and they actually require a lot of maintenance. They need to be respooled, for example, or else they eventually will deteriorate. Something I was unaware about is some of the data sources actually come from microfilm images of punch cards. And you can imagine how the data quality might be degraded by that process, but we actually have quite a lot of data in ICO ads that's come from microfilm versions of punch cards. Other data comes in through electronic log books, so we have a picture of the data entry system that ships contributing to weather forecasting will have used. These came in in about the 90s, so you basically typed in the numbers and those got formatted for you and sent off. And we also have data from delayed mode. For example, data assembly centres. So all of these different types of data have been read in and converted to a common format. And we mustn't forget the modern data system. This is the report card from the ocean ops, which monitors the modern data system. So some of this data, for example, Argo was designed with integrated data management. That was clearly part of how Argo was was set up. For example, for the voluntary observing ships, we rely on national archives of GTS data. So there isn't an organisation that's tasked with with archiving and understanding that particular data source, which actually makes a huge contribution to the global climate archive. So digging into Icarads, I mean, when this data were all on tape, I mean, this just couldn't be done. So now we can we can read all the data in and we can just we can just look at it. You know, so even at this very basic level, this is the, you know, the first time this has been done. So we just, we can, we can look at where the data are. When, when they were recorded, we can understand what sort of platform they're from, are they ships, are they boys, which variables have we got in the data set? What's the precision of the location information? And was that, you know, what's the sampling interval? And is this in local time or UTC? So all of these things are important. And here we have an example of a summary we came up with. And the example is the East India Company data. This data, the bars show that this data comes from the late 1880s through to the late 1830s. We can see that their data from ships, that's indicated by the purple rather than the pink bars. And we can also see the variables we have a pressure, air temperature and winds. The bars at the, by the side of the map, show us that the decimal points in longitude are well represented. And we have observations only at local noon. So we can look at different data sources. And here we compare the East India Company with other data of a similar period. And we can see that the Mori data is much more geographically distributed. And we have more observations throughout the day. It looks like we've typically got five observations per day, which is great. We can see this data is before the opening of the sewers and the Panama canals, which dramatically affects the data distribution. As you can see here, this is some slightly later data from the Netherlands. And we can now see data following the main shipping tracks. But we can also see that those have changed due to the opening of the sewers and Panama canals. We can also see we've got some boxes missing. And this is actually due to tapes that were misplaced. And sometimes it means that we lose the data forever. But in this case, at least some of the data have appeared in a later processing. So it was realized that these observations and these particular associated with these particular tapes, which are particular regions have been found. One thing also to note is that the IDs missing. We don't have any information about which data, which ships these data came from. And this is because the ship name was written on the back of the punch card. And the data was stored by date order for particular ships. So when it was digitized, that was lost. We could have maintained the order of the data, but then they were reorganized into regional tapes. So we've ended up losing that data. So for these observations, we don't have a record of which particular ship made the observations. Just another couple of examples. Norwegian data is global. This is starting in 1867. But obviously focused around Norway and tuna vessels are focused in the tropical Pacific. And one thing we might note about the tropical Pacific data. These are from porpo slog. So they may do in the day. So we can see those. We only have between local hours of six, six a.m. to six p.m. corresponding with corresponding the daylight hours. Even at this simple level, we can we can look in and see some data problems. So on the right, sorry, on the left, we have off the coast of South America. You can see a rather odd data strip. And actually this is data that's been mislocated by 10 degrees. It should actually be 10 degrees further east. So just looking at the data, we can identify problems. And in this case, we can fix them quite easily. But another data problem that's proved a bit more resistant is the Russian meteorological archive. And here we can see superimposed on the now familiar shipping lanes. We can see we've got some geometric structures which are organised by five degree boxes. So we've got data missing and we've got extra data where we don't expect it. And I'll talk a bit more about that later. So we have these many, many different data sources and we can merge these together. We can identify data duplicates and by a duplicate, I mean a different version of the same original observation. So these sometimes they're easy to identify, but each different report will vary according to the journey it's made through data management. So we'll have rounding, we'll have different conversions. We'll have some missing parameters and metadata. And there'll also be some systematic differences. So I would argue that if we can reprocess from the original data sources, we can fix some of these problems. Excuse me and generate a better data source. So these duplicates can help us resolve data problems. This is revisiting that Russian data and we can see that we have lots of duplicates of that Russian data with other sources. And what I've shown in the maps on the right is in the red we have boxes where we identify Russian data as duplicates in time and data content, but not in position with other sources. So we can see that these five degree boxes have been moved around and now we know that we can reprocess. We can try and fix this, but hopefully in due course we'll be able to get a version of the Russian, a new version of the Russian data set. And one reason why we don't have IDs for the Russian data set is because they couldn't handle the Cyrillic characters. So that's another issue that's important to consider. So we have these complex data sources. So why do we need to retain this complexity? So going back to HADASST4, this shows the estimates of the bias adjustments that are needed globally. And this is due to the observations made in many different ways over the period. But typically the early observations were made from ships using buckets, a sample of water was taken in the bucket. And later on, though buckets still contributed, we get more and more observations, finally from drifting boys, but also from ships making measurements of inlet water. So this is a cartoon that John Kennedy produced. And this shows some of the things that you want to think about when you're considering bucket biases. And I've just circled in blue the ones that are metadata and in orange the ones we need data to understand. So for example, it's very important if we have cloud information, we can estimate solar radiation. And that obviously during the day will affect the temperature of the water in the bucket. But we've also got cooling by evaporation, which also depends on the environmental, the local environmental conditions. So in addition to keeping the temperature measurements, we need to keep all these observations of other variables because they're interesting in their own right, but they also help us understand the temperature data. And if we've got information on the instructions that were provided to the users, to the observers, or the information about the type of bucket, that helps as well. And if we can embed this along with the data, that will be really useful. We need to focus on individual ships. So this is taken from a paper in 1948 where they tested different buckets in a wind tunnel. So these were buckets that were used by different nations at different times. And the steepness of the curve, so steeper curves show greater heat exchange with the atmosphere. So we hope to see we just have very different characteristics and different data sources. But the problem is, as I said earlier, we don't have IDs for all of the data. So this plot shows the shaded region where we have IDs, and we can see that we have quite substantial periods where we don't have ID information for our ships. So my colleague Julia Corella did some work where she actually tracked the ships. And we see we can do quite a good job of improving our ability to understand which individual ships observations should be clustered together. You'll see some periods where this didn't work very well. And in the later period around the 1960s, this is some examples of some of the data from the early global telecommunications system providing data for weather forecasts. So we can see we've got lots of mislocations and we have, for example, many blue circles indicating cold sea surface temperatures in the tropics where we don't expect them. I mean, this has been massively cleared up, cleaned up in the archives we see now, but I'm sure we could do a better job of this now than they managed a few decades ago. So reminding us why we care about this, the adjustments that need to be applied to global data sets depend on measurement method and also depend on environmental conditions. So we need to embed that information in the archives, but it's often missing. So what can we do then? Well, because the environmental conditions and the measurement methods affect the data, we can actually look at the data and try and understand measurement methods from the data itself. The data shows this plot here which shows anomalies throughout the day in blue from different data sources show that we have really small durnal cycles from the data we get from engine intakes and whole sensors, but the buckets show really strong durnal signals. So this gives us a way into selecting metadata where we don't have good information. Duo Chan extended this, so in the scatter plot we can see he's plotted anomalous durnal amplitude against the offset. So we saw that engine intakes are typically warmer than buckets, and they also have lower durnal cycles, so now we've got two indicators that we can help to understand the metadata. And we can also see that we've got some dots that don't fall on this neat line relating the durnal cycle to the mean offset. And the reference manual image that's in the bottom right shows an example of why this might be. This is a certain case where when the Japanese digitize one of their data sources they actually truncated the temperatures rather than rounded them. And Duo found that the Japanese temperatures were indeed cool. So what do we do? To make progress I'll argue that we need to expand data coverage, we need to fix the data problems we have and we need to make sure we don't repeat the mistakes of the past. So this is probably an obligatory figure, this is from the NERC digital strategy showing how data are collected, gathered together and then analysed. Most of the data we're considering here fall under other data sources, we have data rescue, reprocessing. I can't read that. Data exchange and observations of opportunity. And what I mean by observations of opportunity is we have ships contributing to making weather forecasts in a formal process, but they also make observations of the environmental conditions for their operations. And people are starting to think about how they can gather that information and whether that will be useful in providing, finding more global information. Expanding data, old weather is an example of a data rescue portal hosted by Zooniverse. There are many of these different sorts of activities recovering data. And this is an open example so you can contribute to this, this is Weather Rescue at Sea and this is showing examples of the instructions that are provided to the people digitising the data. And this is actually, this actually becomes part of our metadata. So we have lots of lots of information and not all of it is transcribed. So it's really key that we, for example, when we've digitised this data, we make sure that anybody looking at the data, if they want to, can actually link this back to the image that we, the image that we have and any metadata that we have describing how the observation was made. We don't always have, can't always rely on volunteers, so this is some work that's been done at Southampton University under GLOSAT trying to automate data rescue. So here we have an example of some tabular structure, this is often what we get, an optical character recognition can often struggle differentiating the text from the tables. So they've done some work trying to reliably pull out structure from different sorts of, different sorts of tabled data. But then the next step is to see whether having understood that structure, we can use the characteristics of what we expect to see in each part of the table to improve the optical character recognition. So, for example, if we know what sort of data, metadata we're expecting to see in a certain part of the table, we can improve, we can prove how the optical character recognition is working. And also we mustn't forget the modern observing system, we don't have archives for all of the data sources that contribute the modern, the observations we use to understand climate. So what are the data system requirements that we have. We need to retain the original data and metadata. Before we do anything we need, we need to be able to archive that and that I think that will require a bit more flexibility on the on data sensors, because these things are not always going to conform to the standards that they would like. We always need to keep everything together and be able to link to seeing documentation. So if we find something interesting in the data we can, we can look at the documentation, the metadata are all back to the original logbook or images. We want tools to be embedded in this. We need QC visualization. We need to improve our duplicate identification and feed all of this back into improving the data. So we need tools for conversion to standards so we need fixed ways of extracting wind speeds from both at scales, converting units and understanding the tables that were used to make old data formats more compact. We need access to external data sources so we can see how our observations look in the context of other observations and we need feedback loops to fix processing problems and data collection issues. And the quicker we do this, the more likely it's able to be fixed by someone who has some knowledge about the situation and we need to be able to do this for a very wide range of data and formats. So I tried to visualize this and I think it is better in my head than it has actually appeared on the screen that I've tried to show the current data system. So in terms of data quality, so green might be good, red might be not so good, different formats in different shapes, different bits missing and some documentation that may or may not be embedded in the data. The ICOEDS approach was to fit this all into a standard format so we might lose a bit of information, we might be able to carry across some of the metadata. Things were kept but often separately and those might have been lost over time and the formats that ICOEDS have used are not compliant with modern metadata standards. And it's a bit of a grey box, we have detailed information about what's been done but it's actually quite hard to piece it all together. So what I'm envisaging is we take the same data and we actually manage to keep all of that information within a common data model framework. From this we need to be able to extract data in more easy to use formats but we need a system where we hold all this information in its original form and we can link back to other pieces of information. So we need to keep the original data, we need to harmonize, we need really strong links to the metadata and documentation and this all needs to be supported by open source tools and we really need to focus on allowing for improvements. So I'd argue that we need some sort of hub, so people can start asking questions, are the data I've got already in the system, can I contribute, I've got a great idea about how I can improve QC, can you use it, here's some new documentation, anybody notice this problem, here's a fix. Somebody's done some work on their archive and they've got a new version and we've got all these complicated conversion tables, that information might already be in the system. So this idea that we can take all these fragmented parts and put them together and have an information generator sort of searchable information system where we can use this to understand the data. So in summary, we need more data, we need resources for data rescue, both citizen science and automation, and importantly we also need to reprocess the data we also have in digital form, and we need to discover the data that we don't even know we've got. We need systems that can handle varied multivariate data with rich metadata and importantly we want no loss of information. Data centres need to be resourced to contribute to this and also provide secure archives. At the moment, some not all data rescue activities are firmly embedded in the data system. And we also need to worry a bit more about how we store data extracted from from the real time exchange system and increasingly from the sort of third party observations of opportunity. And I think it would be great if we could have a collaborative hub to bring together the data and metadata software and data visualization tools. So we can build the knowledge base and pull effort internationally and broaden experience so that we can take, you know, so people can contribute even if they can't, can't complete the whole end when process. So just to summarize that again, we need a lossless data system. We need resources so that we can reprocess and it needs to be modular so that if somebody can make a contribution in a small area that can be used. We need libraries of metadata documentation and these need to be clearly linked to the data. Lots of tools and a hub to focus the fragmented international activities we have in these areas. And the advantages then if we barrier if we lower the barriers to data use, it will allow us to use advanced statistical techniques allow others to use advanced statistical techniques so again broadening this pool of expertise. So we'll have an expanded diversity of data products, and we'll have those feedbacks from this, these new analysis to improve data quality. And I will argue that there will be a huge return on any investment. We've got piece mail activities, which is always very inefficient. We'll, if we improve the data, we'll have better data for and from reanalysis which will reinforce improvements in the data and for modeling evaluation. And at the moment, people are spending research time discovering data problems, which they can't necessarily fix. So if we can actually sort out some of these problems actually research will become much more efficient. So I'd like to stop there. Thank you. Brilliant. Thank you very much. There's fascinating stuff really good. So I, I do have a number of questions for you, Liz, if I may just work through those. So the first one was just just thinking right back to start to look to SST anomalies over time and the discrepancies between the different SST data sets. It's struggling that those don't really appear to decrease over time yet you might expect with better coverage and better technology for observing things you might expect those data sets are naively to get closer over time but they don't do that is there. Is there a reason why those discrepancies have not come down in the more recent period. So the issue there I mean that's actually just a. So what happens is you need to relate uncertainties in each period so we actually for sea surface temperature we've actually got a great observing system now we've got high quality data from satellites we've got drifting boys. But the problem is relating that uncertainty that we have now to uncertainty in earlier periods. What we're doing now is actually referencing things towards the end of the record. So the uncertainty sort of appear in the right place. So the uncertainty associated with with the differences between the 1950s and now sort of appear in the 1950s rather than as a, as a expanding uncertainty range. But the problem is we then are interested in temperature change since the pre industrial. So then we need to reference that the early data which is actually the most uncertain. So you end up squashing the uncertainty in the period you're using as a reference and it has to pop out somewhere. A question about the. How do we manage the issue with the lack of progeny of the data so is it safe to assume that methods were broadly the same from different platforms. To some extent. I mean we have very little information about the types of buckets that were used earlier. And each engine intake has you know the character in it's to do with the internal plumbing of the ship. Even if we have sort of good metadata there's going to be a huge diversity between between the ships. And also even if you're using the same bucket, how you how you use it how long you leave it on deck for all these things are important. So what we need to combine is is excellent metadata or the best metadata to be confined with techniques that can dig into the data and understand from the characteristics of the data. So we need high quality data where we where we can look and and see see the see what the effects of the environment on the data are and then pull out that information. You know and then apply that knowledge to estimate methods and biases for other observations. Thank you. What are the challenges of drawing together all these data sets from different sources, and how are the relatives in accuracy and different methods factored in when creating a coherent merge data set. Well that last problem is basically why there are so few merged global data set. We have to work with what we have, and this is why it's so important that we, we basically get as much information as we can to describe any data source so sometimes, you know that will be very limited. And what we really need to do is rather than start using, you know what what I hope we can do over time is is revisit all of the sources that have gone into I co ads, which, which typically then just get used. I mean we have a fewer excluded where we have particular problems and people have excluded that Russian data and certain areas and things like that. But we really need to start looking at the data and saying, you know, is this is this helping, you know, is this really going to give us new information and useful information, or actually, you know should we just, we just, you know this. Just mark this down as it's not not useful. So it's really just making sure that we carry as much information across so when we start evaluating different data sources against each other, and they disagree, we can start thinking well, you know we know that that one's a bit compromised, you know we've got good metadata for this one so it's just a case of trying to bring all that information together and then making some expert judgment about which ones you should, which, which data you should prefer. And often, none of them are perfect. But it's really what I, I think the real benefits will come if we start splitting off these data management issues from the bias issues so at the moment we're handling them all at once. So, that's why the work that do a chandid was very nice because he was, he was trying to understand the differences he saw in terms of physical characteristics of the biases. And he couldn't understand it because he thought there was, you know, he's thinking these buckets are just calling massively and actually it was because the data have been truncated. So, to understand the, you know, so if we can understand, you know, if we make sure that's embedded information is embedded in the data system. When people find things like that, they can, they can clearly relate it to to the real, the real culprit rather than trying to come up with a model of the worst bucket in the world. A question about observation data collected by commercial companies, because that's often subject to IPR or non disclosure. So how will this be addressed for archive data that's digitised, which might give rise to new IPR and future data collection, and noting also that government data may be open source, but a lot of the monitoring is contracted out to private companies. So some of, we've up till October, the UK has been contributing to the Copernicus project, but sadly after that, that will cease. But one of the things that this is particularly important for land data, because land data is clearly owned by nations. So they have put a lot of work into making sure that the IPR, when it's known is firmly embedded. So that's something else I haven't really talked about. But, but you know, that's another piece of metadata, you know, can this, you know, are their usage restrictions on this data. Marine data is usually covered by WMO regulations and their, their conventions where these have been exchanged because they're important for safety of life at sea. That's changing. WMO data policy has actually become more has enabled us to access more different types of data. But for example, with the move to autonomy, a lot of the data are collected by companies and become proprietary. So I think all we can do at the moment is work, for example, with the WMO to make sure that more different types of data are covered by free exchange, but also from a data management perspective, we need to make sure that that is clear in the archives. So, you know, we want to push to get, you know, so push to make data more accessible and useful, but also be very clear about what we can use and what we can't. And this is a problem we haven't really had for marine data because of conventions, but I think it is raising its head and it's always been a problem for land data. Yeah, okay. Thank you. A question about the, the sovereign ocean, because the sovereign ocean remains very data sparse throughout, I think pretty much throughout the whole period, even with Argo it's still quite data sparse. So, I guess, how much of an issue is that and is there any prospect for narrowing uncertainty in sovereign ocean or is it we sort of doing the best that we can there at the moment. A lot of the modern data that we get from the southern ocean is from drifters, surface drifters, but they only measure typically sea surface temperature and pressure. So we don't have the rich multivariate data we would like from ships, but you know it's all helpful. Interestingly, historically we've had data from the southern ocean from wailing ships and also because before the Panama canal open for example and the Suez canal so we have actually got periods where we do have data in the southern ocean and then the challenges as I in the answer to your first question was the problem then is relating that to conditions in the sort of the normal period, the climate logical period. So, that's something that Tim Osborne's group have been wrestling with in the GLOSAC project they've been trying to extend the land station data back in time. And we then have this issue about what you do when you've got data available historically but we haven't actually got any data about location in the modern system. So that's something that you have to, you know, that becomes part of the data set construction part of things rather than the data system side of things but making the data as accessible as possible. You know, and searching out southern ocean sources is the way forward there. A question about standards and harmonisation of the data so how do you approach mapping old data into modern standards and resolving issues between say how one old data set might not cross where another one might not do so so easily so how do you, how do you tackle that. So, I'll answer a slightly different question to start off with so so one issue we have at the moment is that the data standard data formats we're using aren't ISO compliant so we've ended up having data from modern data systems that are providing ISO compliant data and then having to map them back on to a sort of somewhat archaic formats, which which has proved a huge headache, and something that would just go away if we modernise the data system. I mean there's always a compromise I mean you know to make things widely. You know to be able to compare different things you have to make approximation so we have always information about shit types for example, and sometimes we have huge detail and we have to work out. You know, call out the essence of the metadata. You know, and again just do the best we can. No that's cool. Okay. A question about tooling. What what tools on the tools that you use which ones are found to be the best and are they open source tools. They're available to you. So at the moment, we have typically a collection of tools that have been developed by you know scientists and organisations working on the data. So for example in Copernicus we use the Met Office, the system that John Kennedy developed to work with the historical data. So what we have is a collection of software that people have used that is increasingly being made open source in that it's just being put on a git repository and available for other people to use but I think we're a little way off having things that kind of robust and able to be used very much more generally, and I think that would be an important part of sort of developing this system is to bring together, you know, all of the, for example all the different method approaches to QC or identification or visualisation that we have picking out the best, best features and actually building something, you know, a range of tools that are really robust. We have open source data, but we don't really have tools that you could just expect anybody to use and get good results from I think it's still in some of the block art sort of area there. Yeah, and lots of benefits from doing. Yes. One final question then needs to close. I just wondered how satellite data interacts with any of this and does it does it help you in any way with reducing uncertainties or, or not is that a bit circular in terms of. Absolutely. So, one thing that we make a lot of use of is the ESA CCIs satellite SST data which gives daily a 25 degree resolution data from Chris merchants team in Reading. And that really helps us unpick the variability. And we can use that where we have high quality data we can try and use that as a comparison standard to understand, we understand biases in the modern data we can sort of project that information back in time. The application for this data is a lot of people, typically for variables like air temperature humidity, want to use these observations for satellite CalVal. So there's a sort of a virtuous reinforcement there that we can use the data for calibration and validation of satellite data and then we can also use the satellite data to improve observations and of course we can use the satellite data that we've managed hybrid blended data sets. But that's, I've focused my attentions on the issue of observations today. So, so ideally a data system that allows you to to to compare with many different data sources including satellites and model output would be fantastic.