 Okay, great. All right, well thank you, Mayne, and everyone for the opportunity to present here today. So the majority of my presentation here is about a body of work that we actually investigated and wrote a paper about around two years ago. And it's focusing more at the data level. So a little deeper down about how to work with data effectively and interoperably and make it in the quality of the access and experience for users. And then at the end of this talk, I'll talk about where we're at now and about how we're trying to extend that model to some additional things. So, and I should also note right now that this has been a very big group effort with many of my colleagues at the NCI. It's the brainchild of Ben Evans and Jingbo Wang, who's on maternity leave at the moment. We did a great job in getting our paper actually written and submitted that Mayne and I shared with this announcement today. All right, so just a little bit of background about NCI for those of you who aren't as familiar. So we're a national computational infrastructure here in Australia. So a big part of our focus is traditional kind of HPC usage, but we also are a data repository. So we have, and these slides are now about two years old. So at the time, 10 plus petabytes and that's even bigger at the moment. And we also span a very big range of domains in terms of the data we host here. So we have a very large part that's climate, coast, oceans, but we also span into geophysics, astronomy, bioinformatics, et cetera. So a lot of the approaches and best practices that we try to share with our users really have to be broad enough to cover this full domain. And this kind of relates, I think, to one of the questions that came up from the earlier talk. So this is really forefront in our minds when we try to put together an approach. And the big takeaway is that we really, we want to maximize that usage and experience of our users and ensure that they really have a seamless way to get to the data. And that really all comes back to this quality factor. And our users are looking to do things, like combine data, visualize, and they really would like this to be as easy as possible. So again, this comes back to some of our, what are these motivating goals? And we want value, it's a lot of work to manage data and metadata as I'm sure everyone here has some experience with. And so you put all that effort into that management and you'd like at the end of the day to have that sort of positive feedback and result from all of that hard work. All right, so also just a little bit about where we're at. So our collections are accessed in several different ways. So again, this comes back to how we have to put together an approach that works for traditional direct access on a file system, so local users up in the repository. We have web and data services served through a lot of our cloud facilities and data portals as well as virtual laboratories. So all of these things have to be considered when we talk about and think about our approach for what our quality standards are. And then if it wasn't complicated enough, then when we come back and look at this really wide range of domains that we host here in terms of the data, each domain has some complicated factors that have to be considered. We have many gridded data sets, but we also have a lot of non-gridded, different coordinate reference projections, resolutions, all of these things actually become a really important part of the picture when we start to put together some of these standards that we require. So now I'll go into what this data quality strategy looks like and it's modeled off of this out of maturity approach from Shelley Solid AGU. So this is what kind of motivated the work for us. And we took that and thought, okay, how can we apply that to the data that we host here? And so this is just a very simple schematic of kind of the big components for us. We have an underlying high performance data format that we have to consider. So, and this might change, but we wanna choose something that's flexible and robust enough to be ready for HPC use, but then also through these other usages that I mentioned earlier, like through cloud services and portals and virtual labs. We have to work closely with the data custodians and providers of the data. So planning, deciding about how the data will be accessed, what's needed for that particular collection. Then we work through data quality control. So we try to make sure we're consistent with any recognized community standards. That's our big one. And make sure that we're compliant with that. And then for us, we wanna make sure that anything we serve through our facility actually works across our platforms and tools and services. So this is the big kind of final check for us. If we deliver data, we wanna make sure that it's useful and not breaking. So I'll just speak a little to each of these points. So we started by looking at our climate, ocean and weather data. It's a very large part of our collection. And so two years ago, these were some kind of rough numbers to give you a sense of how many petabytes of data we're talking about in these domains. So we started there. They traditionally use a format called NetCDF. So that was our go-to, let's start there. That'll be the model that we begin with. So that's our underlying high performance data format. This is a pretty famous slide of this from a few years ago that I'm using here, our book is referred to as the National Environmental Research Data Interoperability Platform. But the main thing I'm just gonna simplify here, the takeaway is sort of these levels of different things that the complexities that build up from choices you make at an underlying data format layer, what's required in terms of API layers that have to work with it. Likewise, what type of conventions have to then work with below. And then you get into services and tools. And then finally, that's when, if you make some choices on these bottom layers appropriately, the hope is that, okay, now the user, communities, tools and services and portals, et cetera, then fit with that model. So these, we try to put together checks that address these different levels on this figure. Working with the collaborators, with custodians and managers, of course, that's a big part. And then we get to our compliance. And for the NetCDF collection, there's a community of some very useful standards that already exist out there. So this is, we chose, okay, let's jump on this and extend checkers that we can use in our workflow. And these consist of something called the climate and forecast convention. So we use that as sort of our big checker of the data contents within the file and then some additional ones for more traditional metadata. And then of course at the higher level with catalog collection, catalog records, we have the ISO standards. So you can see there's a few different levels of standards to also consider when you get down into that actual data file. And then this was just a little figure of some of the, when we did a bit of a survey of all our holdings at that time before we started this body of work, we realized that there was a big spread, even if we just focused in on climate data using that CDF, we noticed there was a big spread in the conventions used by the community. So even there, even though there were strong community standards, it wasn't heavily adopted. So this was another motivating factor for us to try and put that into our workflow and put it as a best practice to try and ensure that those publishing with us followed some of these standards. And also to try and get that num column down as small as we could, would be a big pride. And so I think I've spoken enough on these ones, I'll move along. Like I said, we took these standards, we extended different checker scripts and routines that we could adopt at sort of a repository scale. So there was some modifications needed which went into the work we did. And then reporting was something that was really important for us because we wanted to share the output of these checkers or any of the quality information we were looking at, we wanted to then summarize this in a way for our data provider so that it was easy to go and make suggested changes, but then also to be a reference so that they saw that value in the improvements that they would gain from putting that work to make corrections. So we put a little bit of time into thinking about how the reporting would look for us. And then this is a slide on just one of our collections that we worked closely with back at that time. And you can just see this chart here just going a few months at a time, we saw really quick improvements as we went through this process for the community. It was actually a geophysics community in that climate. And so we tried to take what we could from a climate example that was quick for them. And we saw this really nice improvement in quality in their case. And so lastly, in this model here that I just wanted to talk about was this assurance through demonstrated tests across all our tool and services. And this part was really valuable because it was testing that usability across everything that we hosted. So what did we do? We checked commonly used libraries. So a lot of the core system level libraries that were important across a lot of the climate and earth system domain fields. Then we were also testing any of the services that we were sharing this data through. So making sure that things like threads and at the time, hyrax or geo server all worked with the format, validation with different programming tools, Python, et cetera, those were really important for users and visualization. So this is just a few of the big categories. So we had a matrix of all these tests and we wanna make sure that anything we published met any of the appropriate tools for that collection. And so again, we put a little bit of work into how we would share that with our providers. And we learned a lot through this process too. It not only did it give a positive experience for users, you know that they could expect, okay, yes, our data will work with the range of tools. But it also really helped us, we learned a lot through this process as well. And the bonus that I'd just like to throw in here is that we have a lot of feedback to provide to different local and international communities because we did learn a lot. The functionality test led to lots of reference and training material for our user community. So by that, we were able to, as we put together the tests, it naturally led to the development of material that we could use and build in our library. And of course, you know, the real benefit is having something that is really interoperable and follows community standards where possible, which I think really plays back to this quality issue. So what's next or what's been going on? So we have, you know, we have a very large amount of data here. Obviously not everything's a net CDF. So the goal is to also extend this to other formats that we host. You know, so at the moment, we're putting in work into what the process would look like for geotip collections. You know, staying connected and working with international communities. And then a big one right now is extending this framework to now look at that higher level up. So data management plans and metadata catalog records. So how can we start to look at that information for all of our collections and take a similar approach? And I think that's my last slide. Yeah, so thanks so much. And if there's any questions later when there's time, happy to answer them.