 Welcome everybody and thanks for joining us. I'm Cliff Lynch, the director of the coalition for networked information. And you've reached one of the project briefing sessions on the last day of project briefings for the week to track of the CNI fall 2020 virtual member meeting. We'll be doing a summing up of this track, which I'll lead the conversation on Monday at four o'clock a few logistical things. We are recording this and it will be subsequently available to the public. We're having some glitches with our closed captioning, and that may not be working properly. I'm sorry about that. We will make sure that the recording of this does have closed captioning on it. There is a chat box please feel free to use that to comment to introduce yourself. I would like to share thoughts with others. There is also a Q&A button at the bottom of your screen please use that to pose questions as they occur to you during the presentation. At the end of the presentation. We will have a question and answer Siri session that Diane Goldenberg Hart from CNI will moderate. So let me get on to this presentation. I am really, really pleased to have Dr. Robert Hanish from NIST, the US National Institute of Standards and Technology here with us. Bob is a long, long time compatriot in the effort to move research data into a first class status as a scholarly output as something that needs curated. We've served together in a number of forums. He's been a pioneer in this area for a long time at NIST. He has been leading an effort called the NIST research data framework, which I think is really important. One of the things that NIST has done for many areas of scholarly and scientific work is and beyond, by the way, into industry and other places is develop these kinds of frameworks. And they tend to be very broadly based and extremely influential in all kinds of areas, not just research community practice, but government practice, for example, within the US federal government. I shared out a pointer to some of this work on CNI announced a couple of weeks ago, and I'd urge you to keep close track of it. So I'm just delighted that he's here to fill us in on its status progress. Over to you Bob. Thank you very much Cliff for that kind words kind words of introduction. And I'm really pleased to be here to be addressing the CNI community. Before I dig into the framework, I thought I would just say a word here about NIST and about the group that I lead in the Office of data and informatics. If you're not familiar with NIST we are a federal agency as part of the US Department of Commerce. It's been around a long time since 1901 when it was founded as the National Bureau standards, but in 1988 it was renamed to be NIST. We, even though standards are in our name we are not regulatory, but rather we aim to build community consensus standards in all areas of science technology. So what it is, the National Metrology Institute of the United States. That means that we conduct research into the best methods, most advanced methods for measuring things very very precisely and very very accurately. With a network of over 100 National Metrology Institutes around the world that are organized under the BIPM, the Bureau of International Bureau of Waits and Measures, which is headquartered in Paris. We have 5,000 staff, most of them at our headquarters site in Gaithersburg, Maryland, but a number in Boulder, Colorado, also some in Charleston, South Carolina, and a small contingent resident at the Brookhaven National Laboratory on Long Island. Within NIST there are six major research laboratories. I belong to the Material Measurement Laboratory and my office of data and informatics is quite small, only about 15 of us. But I like to say that we lift beyond our weight class and having influence over the way that we manage data at NIST. The activities of my group are around data management. We've developed a public data repository and a science data portal to that repository. This was built originally for compliance with policies emanating from OSTP and OMB. But when I arrived at NIST about six and a half years ago, it just seemed to me, based on my previous experience in the astronomy community that this was simply the right thing to do. We also are developing laboratory information management systems, systems to help curate data sets in detail, and we have been sort of on a crusade to develop and improve data management plans for all of our research staff. We also manage the portfolio of standard reference data, which are collection of databases that are the most highly vetted and most highly regarded, used widely in industry and academia around the world. We also provide consulting services and informatics and analytics and work actively to engage the broader community and in all areas of data management, curation, preservation. Our science data portal and public data repository looks like this. It's a modern website, hosted in the cloud. It provides access to all of our public facing data sets and is actively being enlarged as we speak. One side effect of the pandemic has been that NIST researchers have been writing up and publishing data that I guess was sitting on the shelf for a while. So our rate of publication has almost doubled in the past several months. We're implementing, as I said, laboratory information management systems. I'm sure you're all familiar with making data fair. The whole point of these limb systems is to have the data born fair as justice is coming off of a scientific instrument. Our first test case has been in the area of electron microscopy. And already we now have 10 electron microscopes that are integrated into this system so that an experimenter comes, sits down at the console, clicks, go, and all of the data taken in a session is automatically logged. Metadata is extracted and moved to a database where it can be examined by that researcher, their research team, and eventually included in publications with all of the associated metadata. Our standard reference data portfolio has 65 databases. Most are freely available. Some are subscription based, according to the standard reference data act of 1968, which allows us to do cost recovery. One of our most popular SRD is called the Chemistry Web Book, which has over 2 million web views per month. This is the second most popular website at NIST after the time service. But we have a number of other databases that are accessed very, very frequently. And as I said, are used widely by industry to figure out how much steel to put in a bridge or how thick to make pipelines and all other very fundamental aspects of the private sector. We provide a sort of in-house consultancy for data informatics, training, a very growing area of course in artificial intelligence and machine learning. But being that we are a metrology institute, we also are very concerned about analysis and uncertainty quantification. And we also help people get access to the computing resources that we have on campus, particularly for machine learning and AI applications. As far as external engagement goes, we participate in interagency working groups of the US government under OSTP and the NSTC. We work internationally with the Research Data Alliance, CODATA, GOFARE, and the World Data Service, or SYSTEM, I'm sorry. I work closely with the BIPM and its Committee of International Committee on Weights and Measures, and so forth. We are very much outward-engaged, and as you'll hear more later recently, we are working directly with the AAU, APLU, and ARL in this endeavor called the Research Data Framework. So transitioning into this, what is a research data framework? Well, I like to think of it as a map. It's a map of the research data space. Who is involved? What are they doing? Where are they doing it? Why are they doing it? When are they doing it? And for those who are wanting to enter into the research data space, it basically forms a guide. If you see yourself as a this, these are the things that you need to know about. These are the organizations, these are the entities, these are the policies and practices. So it's really a resource for understanding costs, benefits, and risks associated with research data management. There are risks in sharing data that maybe shouldn't be shared. There are also risks in losing data. So this is to provide a guide to people to figure out the best thing to do, and it's really a consensus document based on inputs and conversations amongst stakeholders. Why do we need to do this? Well, the research data system is very complex. There are many players, various funding models, various sustainability plans. People are uncertain about how long to keep data. How do we assess the quality of the data? And how do we measure ultimately the value of that research data? This is just an example from a colleague of mine in the astronomy community, Matt Turk, put together what the big data landscape looked like a few years ago. Many, many, many players, academic, institutional, for-profit, nonprofit, open source, data sources, and so forth. And this is really just scratching the surface when it comes to the research data landscape. Many stakeholders from government agencies to national laboratories, universities and their research libraries, scholarly publishers, professional societies, these national, international collaboration organizations, standards bodies, funders both public and private, industry in the private sector who are facing the same problems we do in the academic and government sector within their companies, especially as they acquire other companies and find that they have to do data integration and discovery across these different entities. Of course the researchers who are both generating and using data, and of course the general public. Why do a data research data framework? Well, again, we want to really leverage research data to address global challenges such as the United Nations Sustainable Development Goals. This requires being able to integrate data from very disparate sources, data that were never really intended to be shared across such diverse platforms. And so reaching out to solve these problems is going to require solving serious issues around interoperability in exchange of complex data. There are many benefits to doing this framework. One is increasing research integrity. We really need to understand the transparency of the research process. But we also want to reduce costs and maximize efficiency by establishing and sharing best practices for research data management. We need to guide risk management and reduction, helping organizations to understand their risk positions and how to improve. And of course we want to increase scientific discovery and innovation by following the fair principles for better utilization of data. There's both a national and international need for a research data framework. We know that data is proliferating at an exponential rate. Data management is complex and confusing, but mismanaged data has potentially dire social and economic consequences, including loss of global leadership and critical technical fields. So we need this within the US to really coordinate and establish a research data infrastructure. But the research data landscape is truly global in nature. So international collaboration and coordination is clearly necessary. We think that NIST is well positioned to lead this kind of a project because our business is about consensus building. And we are a neutral convener of diverse communities. We don't favor one technology over another, but we really try to bring together communities of practice to agree on the best way forward. So the process we're following is using the NIST cybersecurity framework as a model. Cliff mentioned that NIST has been in the framework business for quite a while. And to start out with a pilot program so that we can understand some of the complexities and building out a framework in kind of a safe space. Again, this is about community consensus, not any sort of imposition from NIST. But if I am, say, a researcher, if I am a chief data officer, or if I am a research librarian, then this is what I need to know. And these are the people I need to talk with. So I made this effort off almost a year ago now with a scoping workshop that was held at NIST with 50 invited participants representing broadly the stakeholder community with both US and international representation. It was led by myself and my colleague Bonnie Carroll. Some of you may be very familiar with Bonnie she was the founder of international information international associates. She was the secretary general for codata. We have a very prominent steering group that is helping us on the way forward. Again, I think you probably are familiar with many of these individuals representing academia, government funders private and public, and both national and international in nature. And Heather Joseph of spark is your next door neighbor at CNI. So the summary from this workshop was one to confirm that this research data framework was something that would really be needed by all of the stakeholders. And what we needed to do next was build up management commitment to complete the scoping to run the pilots and expand the community of interest. We have settled on doing two pilot projects, hopefully getting started early in 2021, one which will focus on a research domain, which is material science. That's chosen because material science is a major research area within NIST, and they're already have been done a number of scoping efforts to understand the research data landscape within material science so we intend to build on those, not to replicate those. We have reached an agreement just in the past two weeks or so to work very closely with the AAU, APLU and ARL, because they have already engaged in building a guide to the research frame research landscape within their community. And we want to build on that capitalize on that and work collaboratively together to really flesh out the research data framework. But we're looking for cooperation across government and across other organizations to really move forward with the entire framework. The framework itself is based on functions. One of the workshop last year and subsequent work by myself and colleagues at NIST has come up with at least these six basic functions which form the primary components of the framework. And they're very much analogous to the research data life cycle envisioning planning, generating and acquiring processing and analyzing using and reusing, and then ultimately preserving or choosing to discard data. So one of these functions has categories and subcategories. And again, we're using terminology here that draws directly from the cybersecurity framework that NIST developed about starting about five years ago and has been widely adopted in the IT community. Within envisioning, you consider things such as data governance, community engagement, the data culture of an organization and the reward structure. There's planning, there's figuring out costs, funding, data objects, what kind of data are being produced. And here we include not only data from experiment and simulation but also software instrumentation and data management planning. The NPs are a really important part of this landscape because they capture the intention of data taking. And of course, what kinds of data formats are being produced, what standards are being followed. In generating and acquiring, you know, what are the sources of data in house experiments or computer models, external sources is this demographic data is this marketing data. What kinds of experiments are being done, and how do you capture the metadata from instruments, many of which have data that is produced only in vendor proprietary formats. So one has to worry about data capture and recording measurement protocols, capturing information from experimenters that cannot be captured directly from the instrumentation. And it's external sources. How do we identify what's the provenance, how do we harvest the metadata so that we know where this data came from and and its reliability. And again, what formats are the data coming in and processing and analyzing, where does this data come from, can we trust it. What architecture are we going to use to securely manage the data is it hosted on premise, or is it in the cloud. What kind of software is being used as a commercial is it scientists build custom software, what version is being used and what stability does it have. Are we capturing workflows through electronic lab notebooks or laboratory information management systems. And ultimately how do we publish this data. What are the long term plans for stewardship and management of the data for use and reuse. I intellectual property rights and restrictions are their licenses to be negotiated is data access internal or external or both. And do you provide application programming interfaces. Do you download data or is it possible to visit the data with your, with your analysis tool rather than downloading large data sets. And analysis tools ones that drive from AI and machine learning and if so, how do we learn to trust them and evaluate their, their performance, and ultimately how do we track the impact of the data that we are using and we're using through tracking and citation. And finally, how do we make decisions about preservation. What are the longevity requirements for data and who pays for that longevity. What happens to data sets that become orphaned when their projects are no longer funded. How do we plan for migration of data from one medium to another. Repositories come to play domain oriented repositories institutional repositories general repositories, all of these exist, and have interactions that are important to understand. And ultimately how do we decide about retention and disposition. What happens to data that goes dark. Is there a some permanent record that it existed, and that it ran where it was used. So where we are today, we spent much of the last year building support for the framework. Back in March, I was able to present this to the OSTP subcommittee on open science and in a very small meeting with the director of OSTP Calvin Drogemeyer, who promised very strong support for this effort. We developed the roadmap and structure embedded it with our steering group. We're now looking for the funds to actually carry this out either through collaboration and in kind support from other organizations or people who actually have money that they can can put on the table. We want to work with professional societies with the scholarly publishing community. We set up a website to capture all the information we related to the framework, and that URL is listed here and that is that is alive. For reference, these are the other major frameworks that NIST has produced in the past several years. The cyber security framework, privacy framework, and the big data interoperability framework. But the big data framework was started, I think about six years ago it now comes out to seven or eight volumes and work is still in progress. I'm hoping that our research data framework is not quite such a monumental work and that we will actually have results from our two pilot studies available within about a year's time. We've been successful in building community interest and engagement from diverse stakeholders, both nationally and international. The challenges that face us are of course resources timeliness because the research data ecosystem is changing rapidly and controlling scope and scale. The strategy for moving forward again is to start with pilot projects to validate our approach, retune as necessary, and build wide collaboration with other federal agencies, societies, scholarly publishing communities, and of course, you all in the CNI community. Please contact me if you'd like more information. We are about to issue the preliminary framework document after some updates that will also be available from the website that I showed you before. And I'm happy to add anybody to our list of stakeholders if you just want to get in touch with me. So thank you very much for your attention. Appreciate it. Thank you. Thank you so much for that talk, such an important project. And we're really delighted that you were able to share that with our community here at CNI. And so the floor is now open for questions, please type your questions into the Q&A box and I will be happy to share those allowed with Bob. I have also put in the chat box, the URL that Bob shared earlier for the project website. So I believe you mentioned just a minute ago, Bob, the draft framework is going to be released for comment here shortly, is that right? Okay. So what are the organizations that wish to get involved? At what level of involvement are you looking? You mentioned a few in terms of monetary support, what other kinds of activities are you looking for? We would just like to embrace as widely as possible the stakeholder communities. I do a PLU and ARL are very enthusiastic about working with us and I will help work with them to help identify participants in a kickoff workshop which we hope to hold in March of next year. But anybody who has an interest in this either just as a follower, or as a doer, even better, we are happy to talk with you. I have interest from the American Chemical Society, for example, representing a major publisher. The American Geophysical Union also as a professional society and publisher. And those threads all need to pull together between the universities, their libraries, and the publishers because that forms a very strong ecosystem that we would like to really understand. Good. All right. Well anyone who is interested in participating or getting involved, please reach out. And I see we do have a question from Matt. Matt says, thank you for this presentation. You mentioned that the framework is akin to a research lifecycle concept. Do you consider other pre existing research lifecycle lifecycle diagrams such as the data one lifecycle, and how this framework builds on or differs from those. We certainly did we have looked at probably dozens of various representations of research data lifecycle. The one that we came up with these these six activity areas are was really the synthesis of the workshop that we held almost a year ago. And there again, many of the participants put very variations on the data lifecycle, including data one for us to look at. And there's no one of these that I think is particularly better or worse than the others they all are are valid. We picked something that we thought would work for us, but one of the outcomes that we hope to get from the pilot projects is to refine these. If we don't have quite the right characterization will change it will update it and learn from going forward with the pilots. So that's a really important aspect we didn't. We're not trying to boil the ocean. All it was with maybe warming up a few lakes here and there. That sounds manageable. Thanks Matt for that question and thanks. Thanks Bob for for your response. I see that we are adjusted about time now so I'm going to. Turn off the recording but I invite our attendees who are still with us to please stick around after I turn off the recording if you'd like to join the conversation if you'd like to approach the podium as it were and ask a question or make a comment have a chat. Please feel free to do so I would be happy to enable your microphones. Just one last thing. Thank you to our presenter Dr Robert Hanish. Thank you so much for coming and talking with us here today. And thanks to all our attendees for making time out of your day to be with us here at CNI we look forward to seeing you at other project briefings. Take care everyone.