 Hello, I'm Tom Mansfield, I'm from Plymouth Marine Laboratory, which is a multi-disciplinary marine research centre down in Plymouth in Devon, I'm sure many of you know about it. But today I'm going to talk about kind of one very bound piece of work, but it's just coming to a conclusion. And the first thing we've had is something like five months to take some NERC funding that's funded by the construction of digital environments to see what we can do to try and make make data more fair. So the data from the Western Channel Observatory to increase the fairness of it so that we're well positioned when new technologies come along, when autonomy comes along, to increase the increase the resolution of the data both temporal and space, sorry. So all right, you're in the right place. So what can we do, what can we do to improve the fairness of data to make it more machine-improvable, to make it scale better, to make it more findable as we go to somewhere where testing isn't weakly, it becomes constant continuous with multiple sensors. So it's about the project, it's about this presentation, sorry. I'm going to give you a very quick overview of the motivation, that's a bit more about what WTO is, why, why do we care about fair now. Talk through the method of how we've gone about thinking about the aspects of it. The method is very, very closely linked to the fact we've taken quite an agile approach to actually developing things and iterating around quite fast, kind of building on some of the things we had in the workshop yesterday with the Cald workshop on design sprints. We've kind of taken that philosophy of pulling together kind of a not just PML but also we have partners in this project from NOC and from VODC, who helped us in those sprints to develop what we're doing and then a quick overview of the key outputs. So at the end of this bit of this bit of funding, what we learned, how can you go to find more, what resources are there made of the team. So a reasonable amount to go through, also lots of plans. So as I put this a little bit faster, which has been running by way of my way around. But I'm sorry, where's the channel directly, this is a recap in Plymouth, we're just off the coast of Plymouth and effectively it's lots of boys, lots of atmospheric stations, we have folks going out from PML also from the Marine Biological Association, almost co-locating this in Plymouth and taking measurements of, taking measurements that we're able to see. I think the title was from from photons to fish, it's really everything, everything in between. And so this I think I also mentioned by Matthew Palmer in the presentation, my colleague would be there yesterday, that I had some data set running since 1903. So there's a really, a very long time series of data covering really a lot of parameters. And so we're at the point now where currently that data tends to have been collected by humans on something like a weekly basis. There are automated boys, but a lot of it is, every week a boat goes out, there are nets, there are buckets of water, there are people taking samples, and there are humans putting the data into databases, writing metadata in blog books and then copying it into various places that the data survives. We have a lot of work at PML and in other places, Matthew also mentioned the National Center Coastal Autonomy yesterday to look at increasing autonomy, so increasing the number of platforms, the number of sensors, we start to increase the persistence of the sensors, we start to increase the number of them. So effectively there's a wave of data coming towards us. And hopefully before that wave of data really comes, we can do something with the data not there. That's on the supply side, if you like on the demand side, there's also a lot of requirements for these big data sets with machine learning, the artificial intelligence, with lots of smart things we can do that will benefit from the higher resolution data, but only if we can make it machine interoperable, if we can make it fair, we can make it good. And so that affected the needs of the aims and objectives of this very bounded piece of work, this is what we can do in a reasonably short space of time. The main project aim is to use fair data principles to maximize the impact of this kind of far resolution data. And we broke down into two objectives. One is to make the data more findable and allow people to explore it a bit better, data and metadata visualization. And a second part of the objective is to effectively improve the data access. So once someone's found the data, once they're interested in it, how do they know where to go to get the data? How do we give them the best version? How do we give point them in the direction of the persistent places for the data when you can have near real-time data compared to things from a hundred years ago? It's quite different data. So how do we structure it so that it's pretty simple to step through and get the data from the right place with the right provenance, with the right ownership, licensing. Okay, so to say this is kind of where the slides start to get slightly more powerful. So again, this project started with a very strong focus on who's using the data and why are they using the data, which is why I mentioned the kind of design sprint philosophy we had last time. We've had a couple of workshops with our partners in the research community. Also we've kind of pulled in many of the stakeholders that we can get a hands up to try to understand who's using the data. And focus on two main areas. One is for data contributors. So effectively mitigating the barriers that the user's community standard metadata throughout the pipeline. So this is effectively that the scientist on the books who is taking the recordings want to remove all the barriers. So rather than writing the metadata down or logbook using their own terms, using something else that they actually start to use by default, things from the NERC vocabulary server if we can get there at all. So right at the start of the data pipeline we're starting to use standard terminology, standard method. In addition, the other area we're focusing quite a lot is that data users, which is encouraging data users to effectively access the data from the correct place and to understand the how to how to cite it, kind of how to reference it, where to go get it again in five years time. If they want to get it in the same piece of work. So they're kind of two of the main areas. We don't forget that, but we have lots of lots of stick people. Just a quick recap of the principles, they come up quite a lot in this presentation, and effectively we're looking at making it fair, a lot of language today, and it's bound by the accessibility control from our user. Hopefully it's a reasonably safe audience, but people know fair reasonably well, and they're never going to have to be around this. And throughout this presentation, we're using the GoFair principles and what that makes. Do you think it sounds good? Do you think it sounds good? And so the first glimpse of some of the tools we've actually made, which I'm excited about, but effectively this is a website, it's currently that's an improvement of the current WCO website, and it's currently sitting on our development server, but it's due to go live before the end of this month. So it's coming soon. And the idea of the video is just to prove that there's a real thing, promise. And effectively one of the things we've done is start to improve the tools so that people can start to dig into these really big data sets you've got, by the way, and it's many, many decades of data. And we try to provide some tools so that you can actually kind of, at a high level, begin to what that data is, go look at a certain day, look at the parameters that are available, look for interesting features and be able to be able to understand what's there. And this is a really nice piece of functionality, but as we see through these slides, we have a kind of fair bingo, I think is the best name for it, where against the pair principles, it doesn't actually get you anything, but it's kind of a really useful first set of three pairs of things. And as we go through the other features, the idea is with this, then we can show how they link to the pair, how the things we've developed, link to the pair principles. So next one. So a very vital thing is the kind of the increase in the quality of the usage metadata. So throughout all the tools, all the tools shine a very clear light on what metadata is there. But for instance, in the tooltips when you look at something, it leads to the kind of the metadata terminology from the standard, the community standard. And if there is a different metadata word, so if someone's, I didn't call it water temperature, and they misspelled the water, so it's something very similar to water temperature, and that will come up in addition, but then there'll be an empty space about that term that we probably shouldn't be using. And the idea is that as the data goes in automatically, it prompts scientists to go use the standard terms, because they want to use a different one, they can do, but it kind of raises the question of why not use the standard terms anyway. And so that's us to get quite a lot of points towards the kind of the bindableness and the dropability as well. It's kind of using some tools and some visualizations to try to encourage the use of standard metadata. Several of the discovery metadata as well. At the top, we have the discovery metadata is always visualized when we're using the data, you have the option of taking it off with a little radio button. By default, it's always there. And if there are only same metadata fields, then there's an empty space. And as a way that the community data uses, that everybody in the outside of PML can see the quality of the metadata, discovery metadata at a glance. So if there's things missing, maybe that's fine, maybe it's a problem to go through something together. It's a similar set of bad things that we have in the usage. Interestingly, the plot so far, the background has always been used. We now have one of the red background, which is in the tools, we now have kind of an automated quality checking. So it's a reasonably simple quality checking. The moment that as the data comes in, the near real-time data comes in, we check it against the kind of scientists derived upper limits, lower limits, and kind of a variation test, a variance test, just to make sure the sensors didn't align. But we at least have the ability that if we get some data that we could have some errors in it, then it gets tagged. Well, it gets, first of all, visualized on a red background. The title has data with beta validation tests. And we are in the process of then adding a field into the metadata to show that this is a variation test. So it helps on the reusable bit. It kind of has not quite colored in blocks because, apparently, we have the framework to do the error checking, we have the framework to kind of understand. There is something that's simple, but we'd like to add a lot more complexity to this. Perhaps a piece of future work, but at least we have the functional block to be able to put error checking into the data, and this is automatically there. Similarly, we have the ability to handle missing data as well. So if there's missing data, this is the most boring slide I've ever presented because it shows something missing. But hopefully, if the missing data is there, we still have that would be discovery metadata if there is some metadata. And it kind of falls over in a graceful way, and we can show it to the data. It helps us with accessibility. Interestingly, in addition to the automated data checking, we also have a very human data check anywhere, and all the screenshots of the smiling faces in the bottom right hand side, which is as people are using the data, the community using it, members of the public using it, scientists using it, if there's something good or not so good in the data, but now the ability to very quickly hit the smiling face of, I like it, I don't like it, and leave it on the notes, so if I don't like it, it does. And the idea here was that it's super likely there's hardly any burden. Hopefully, someone will fill this in before they just hit X and there's somewhere else to look for data. And we start to be able to collect some data, some ideas about how we think we are now using community standards. We think we're using kind of useful features, user-friendly tools, and we start to find out, is that really true? So we start moving forward. So again, it helps us not too much to start this conversation with the kind of reusable data of ensuring that we use it for its standards. Thank you very much. This is a reasonably important one, I think, of accessing the data. So we've had a lot of metadata so far. What's happening in this video is that we get the composite blocks, which are lots of years with the data. And effectively, you can download the data, the texture to this page, which is the wall of who's using the data to use that in the future. You can hit submit. And what you'll see in a second is that we have kind of a request-handling service in the background that if you're looking at data from, I don't know, 2015 to today, the data that the idea is that the newest data goes into the kind of staging area in PML, where it might change, there might be some error corrections, there might be some kind of, some things happen to it. Once as far as the quality checks, it goes to our partners in BODC. So what's happened here is you just looked at the date range, you don't know where the data is. You can see that in the background, the service goes, gets the data from BODC, where this data is from BODC. Also, there's a piece of data in PML that isn't the BODC that's put in the WC overview. So it's completely transparent to the user. They just look at the data they're interested in, they say they want that data, and it gets it from the best place. If that's BODC, where it's had extra quality checks, where there's a really persistent identifier, where it's really not going to change, that's the best place to get it. If the record was recorded yesterday, and it's in WCO, then it comes from that. Again, yes, there's lots of accessible in the fairing group. So if I explain to a couple, because I have a few more in a second video, very quickly, we will also clarify licensing and how you can use the data as opinion-based, as clear and as concise as we can make it, which helps. So, lots of colors. A very important part is that the way that we collect metadata from scientists, because that's the thing that drives all of it, is Excel. So we made it super low-tech, barrier, not scary, not verified. It's Excel. It's probably something that's smart in the background, where Excel leads to things in the vocabulary service, so by epoch, trunking to use standard stuff. But it shouldn't scare anybody if you're a super expert in the sampling water with a bucket, and hopefully this is the minimum kind of a simple interface that we can have to extract the metadata from, which helps in usability, because it's a massive thing for the interoperability to nudge right at the start. So, conclusions. We have lots of fair things. There's a couple of things that aren't covered, particularly around the findableness, which we're working with our partners at the ADC, so when you download the data, we've also got the system identified in the metadata we hope from the near future in the video sections, from the long way in the fair bingo. This is the Adler.Northwire house, it's almost the house, and this is my final slide. There's a lot of colors we can get through. But some of the key project outputs, one, is we have an improved website to update the users. The second output is we have an improved interface that encourages scientists to effectively use the correct terminology, standardize their data, which also gets them a lot of benefits, because then hopefully they could use a single analysis tool among multiple projects and among multiple teams, because data is in the same format, has the same headers, you can use the same same pipeline scripts to process it and other things. So, there's a lot of benefits there. We have also, throughout the impact data, so when you request the data in the background, it notifies us and PMR as to who's using the data, what are we using for, and we have the GDPR element, so we can get in touch with you and the systems and find out how you used it. So, a couple of useful things with the community. One is the record of disimplementation in the post, it will be there by the end of the month. Interestingly also, there's a reference architecture, which is now available, which gives a kind of an overview of the approach we've taken, the design decisions that we've made, the architectors have all worked in the back in a way that is hopefully completely reusable. So, if anybody's doing anything similar, this is a good place to go, look, steal our ideas, reuse our ideas, improve our ideas, please feel free to let us know any improvements, and hopefully this is a useful resource for the community. Thank you very much. So, a quick question on hybrid questions. So, did you did you look at different metadata schemers for the elements? I mean, I'll say to you, have you had to implement specific time series schemers to allow the graphs to work automatically? Not too bad, to be honest. So, I say we've done this in collaboration with partners at DODC, James Dayliffe in particular, and so yeah, for the discovery metadata, we use the median metadata discovery standard for the usage metadata, then I forget now, it's the specifically see the internet to write stuff. So, no, I think we haven't made anything. I think we link to those things. The link is in the simple looking in Excel interface, and that's where it's in the background from sheet of the service for the correct metadata terminology for the schemers. And once we have those, it's reasonably easy to work with. So, very much.