 So, my name is Gopayne, as me has mentioned, I'm one of the co-chairs of ECIP Information Quality Cluster, pushing the opportunity to introduce to you community guidelines for fair data set quality information. David would talk a little bit more about ECIP Information Quality Cluster, and I will focus on the guidelines today. I don't think I need to see much about why should we care about the data qualities and the potential impact of poor data quality to this group. I have a list on some of the main potential impacts of poor data quality, and including reputation that could be individual researchers or the organizations, and some impact on the decision-making, whether it's due to decision on data use or making decision either policy or other related decision on use of data. And the last one has to do with the financial loss that probably touches mostly the business sectors, but for the research, it's also associated with low productivity because we have to spend a lot of time on cleaning data. And for the lost business, including either data product use or services because customers just frustrated with the poor data quality that either served or as a base for services. This one talk about the U has, sorry, set out to estimate what are the costs of not sharing data. They have found that at the minimum of 10 billion pounds per year, the slides indicate the data scientists spend about 80% of their time on the cleaning, organizing data and collecting data set. This is for the AI or machine learning of data scientists dealing with the big data. For the regular scientists like us and dealing with the normal data sets, for me, I have been working with various data types for over the last 20 years, and mostly I'm spending 10% on collecting data because most of my data I need are small and I tend to know something about them already. So 10% is a good estimate. And spending most of the time when I get the data set I need estimate their data qualities and organize them co-located and make them analysis ready. So normally for the cost of not sharing information about data quality, whether we think it's 60 or 70%, that's going to compound the cost estimate by not sharing data. And the estimated 10 billion that's just for EU only. So globally that's going to be a lot more, particularly again, due to the productivity loss because of redundancy in assessing data qualities. And the part of impact is on the decisions and some of the decisions could be the dollar decisions, such as those for disaster response. And we have one real life use case included in the guidelines and document. So for data set quality, not just data quality itself includes quality of data, including input and output, and include the softwares and workflows, many data and documentation are very familiar with that. And also touch on the quality of the procedures and the process and infrastructures, tools and systems. They are mostly relevant for services. So in the nutshell, the information about quality of the data, metadata and documentations and through the entire lifecycle of a data set. I like to share a real life experience on the quality of metadata. And I worked on a paper comparing the global numerical model forecast and satellite data with the tall data set. Those are high quality and we use it as a tools to estimate the bias of the satellite data as well as the model systems. We submit the paper and we got a review back. We addressed review and everything's offset. And then one of the reviewers realized that the metadata, in the metadata record, the latitude, longitude for one of the buoys was off by half degrees. So she forwarded that, asked the editors and to forward that to me. And I was like, oh my God, the whole paper has to go down the drain. And so I had to redo all the interpretations, redo all the analysis. In the end, it doesn't impact the result. And I felt fortunately enough that one of the reviewers was the scientist for that particular buoy. And it would be a lot worse if the paper was published. Normally, I will talk about needs and challenges and the benefits of sharing data set quality information. But today I like to use this opportunity to focus on the guidelines. It's there themselves. So I have listed this too. If you're interested in learn more about it, you can do that afterwards. But I will touch on two things. One is on changing data paradigm. Another one is by the popular demand and touch on the data and information quality dimensions. So in the distance past like 20 years ago, especially when I was a student, data are not readily available or routinely available shared. And most of the data users are do have the extent knowledge on the subject or their data set, we do have to spend a lot of the time get familiar with the data set. And nowadays, we have one more data set readily available and they're shared. And the users start to expand to include general publics, they don't have or little knowledge on the subject or the data set. One of the example I'm very familiar with is the Reynolds global sea surface temperature. I was numerical, I was a modeler in my previous life. I worked with the atmospheric and oceanic general circulation models as well as the coupled models. So at the time, the Reynolds SST, it was one data set and everybody used it. And it was monthly and two degree by two degree. And the SST data, it's a very important data set a use for initial or boundary conditions for general circulation model and also you critical data set to validate the output from coupled models. But nowadays, and for the group for high-resolution SST, and you I searched a couple days ago, it has 109 data set. Those are global high-resolution SST data and the daily and they are generated or produced by trusty sources such as NOAA and NASA and almost every big institution has one data set. So in the past, we're lucky to find just one and now more than one we can handle. And the question is start to become who has time to go through all available data set to figure out which one to use. And in the past, we tend to have more time to allow us to learn about data set. Now everything has to be done like yesterday. So sharing data set quality information is more important than ever to support informed decision on use of the data set. This is one of the main challenge for sharing data set quality information because the data set quality is a multi-dimensional and requires cross-disciplinary knowledges and the quality has data set quality has many quality attributes. On the right side here is the from 1 and strong 1996 they have collect 179 quality attributes that are useful or important to data consumers. Based on the 179 quality attributes, they prioritized 15 quality attributes and group them into four dimensions, intrinsic, contextual, representational and accessibility. I have talked about those last time I have a presentation. On the quality information quality cluster, we have published paper led by Rama in 2017. We have grouped the quality data and information dimensions into four, namely science, product and storeship and services. So for the, I'm just going to go really quickly. For the science, there are three, for each dimensions, we have about three key stages. For the science, they are defined, developed, validate. So the quality attributes associated with the science tend to be data accuracy and a decision, positions and certain keys. And for the product, that measures how well the product has been produced, evaluated and utilized, either released publicly as a research product or delivered to data centers or repositories. And then the quality attributes would have like couple minutes of data, the how good is data format in terms of data standard, data format standards, and estimate of error sources. And for the storeship, sorry, for the storeship, the quality attributes including complainers of the metadata and that touch on how the data is being curated, preserved and accessed. And here it's more of to do like curate metadata and to enable data to be accessed that the get to the service part, that will be how the service accessibility and the time minutes of service is also very important. Again, I'm just going to go through really quick here. And once you get a chance, I think the slides will be shared once you get a chance to take look at it. And we are also interested in the feedback on this particular diagram. To give you a little background on how we started and where we're now the initial discussion, it's between ECE, IQC, and the Barcelona Supercomputery, the evaluation and quality control team in September 2019. Then the announcement was made to prospective collaborators, international domain expert for a pre ECE workshop in early 2020. And then we had a meeting in the summer 2020. We had two life sessions and online resources for two weeks. And I think the the data quality interest group had a pre pre ECE workshop and provide very beneficial feedback to our workshop and I want to give a presentation as me has mentioned. So the the workshop I have explored multi dimensions of data and information quality and the challenges and need and the current approach of capturing and representing data set information quality. And the we have kick off the develop the community guidelines and the couple calls were made to the community for the working group. Then the working group has been established and since then and being developing the guidelines by consolidating community impact, collecting the community best practices. So in August, we have put out published the workshop summaries with a case statement. In December, we also submitted a call to action statement for the fair data set quality information to data science journal to be included as a special collection of open science. And here are some of the principles for developing guidelines. And the approach is to adapt the fair guiding principles. And we we are taking a whole data set lifecycle approach that includes all different dimensions of the quality. And because of the attributes, quality attributes and assessment types are many many and they are depend on the famous for purpose, for purpose, we decide to make them the guidelines to be agronostic for quality attributes and assessment type. And the the develop of the guidelines is really for the community by the community. And it is through a iterative process. And we continue engagement, including we're doing this one and the leveraging the community best practice and standards. So the they are a lot of the we're not really doing this from scratch. There are a lot of community best practices and international standards in the various part of the quality describing quality information and we're trying to pull together integrating them and make it easier for the community who needs or would like to capturing and representing quality information. So the quality information guidelines document develop is all co organized by three main entities once ECIP IQC and Barcelona Supercomputing EQC team as well as the data quality interest group. And the I have put this out to of the participates to give you a sense of the geospatial coverage as well as the knowledge coverage and not just data acquisition productions also including data and information management data and publishing and services and applications. So we're trying to include as much as the as many disciplines are domains as we could and to hopefully will come up something that will be beneficial to the whole community. So for the next five slides I'm going to talk about the guidelines in detail. And let me see my time wise and I was hoping to finish early to give you more time. And the guideline first of guidelines is to describe data set and we recommend to include the title the persistent defy version public state and if the data is a user and you want putting data that's accessed and if the applicable and putting the we recommend to include usage license and this is to ensure the data set is findable and accessible. The guideline two is to utilize a structured quality assessment models. So the structure in the sense that in the form of metric or maturity metrics for example and they make sure that they are versioned and publicly available and the registered are indexed and retrievable by their identifiers and this is to ensure that the assessment model is findable and accessible. And guideline three is about capturing the assessment method and result in a data set level metadata record to recommend to be semantically and structurally consistent and including description of the quality assessment or dimension that is assessed and the method and the model structure and version and assessment result and also including the versioning and revisions of assessment like pronouns of the assessment. This is to ensure the quality information is interoperable and reusable for machine and users. This one is to ensure that the information is findable, accessible and reusable for human and users. So that will be preferable following a template and published with a license preferable CC0 or CC by license and linked to the data set metadata so they are connected. The last one is to recommend reporting and disseminating the data set quality information on a website interface and to describe the data set according to the guideline one and including assessed quality attributes and method and the process or workflow and including a description about how to understand and use the information and this is to ensure that information is online, findable and readily usable. So we are in the process completing the guideline document to make it ready for review and with the paper revisions we push it back a little bit. I think currently the plan is to have it to the middle of April then the baseline will be several months after that. So as I've mentioned, although the working group we're trying to include domain expert, we cover the wild spectrums and we still, it's likely we won't be able to include everything so your feedback is crucial in improving the quality of the guidelines and the work are not possible without the effort from the international fair data quality information community guidelines working group and you probably see a number of the familiar names from this list and names involved are the organizing committee for the pre-ECIP workshop and if you are interested in supporting this effort in any way, contact us. I listed the IQC co-chairs email here. I think that's what I have. Yeah, I have several backup slides there if you're interested to take a look at it afterwards.