 There we are. Good morning. So I won't be able to see you while I'm talking. I need to be able to read what's in front of me. It's good to see everybody here. This is my first meeting in a couple of years so and I will just say I haven't broken some of my bad habits from previous talks, so I'll try not to be long-winded today. Bringing you just an update on a workshop called fair for us, so fair in the U.S. We wanted to gather community input around the implementation of fair and current successes and roadblocks and what we might want to do to set a plan. So we all know that we're in various stages of implementation of the fair data principles and their inspirational and aspirational aims for funders and research managers, but there's a particular aim that we're all looking for and that's to improve the reuse of data for new scientific discovery and scholarship. And there's a whole range of activities we can do and opportunities to expand expert data workforce in support of institutional work on public access to federally funded research, to drive AI initiatives and foster markets for new technologies and implementation services, and to develop new practices and tools and services to create and improve the fairness of data and related research objects that will improve the machine readability and interoperability to facilitate decision-making about the reuse of data, whether that's for various or particular purposes and of course across sectors. So we were funded by NSF to do this workshop and the plan originally was that this would happen in advance of the solicitation for fair and open science and RCNs and I'll talk about that in a minute. But what we did was put together a workshop that would allow us to think about identifying first steps or next steps and planning activities for research data code workflows. So even though we all talk about the fair data principles, it's really clear from our work both across the fair data stewardship community, but also amongst our domain science colleagues that at this point we really do need to be thinking about and focusing some energy on what does fair mean for code and scientific software and workflows and the other kinds of objects we really need to be able to have access to not only for reproducibility that is one slice of what we're thinking about but really to capture the scholarly record and to facilitate reuse over time. Secondary benefit of this workshop was an opportunity for conversations among people that don't normally sit in meetings together. So you know one of the things I think at least in some communities, research data management community in particular, I think we have over the last five years or so had a lot of the same conversations. It's great we have really figured out some things we need to do and ways we need to engage our organizations and the researchers, but we also need to start to build a plan to move forward beyond some of those things that we know already and are doing already. So we aimed for a diverse stakeholder and funding organization representation to explore opportunities for multi-agency kinds of efforts and we also wanted to think about novel approaches for really thinking about US competitiveness in research and development. So I'm happy to talk about any of these points on my slides. I'm going to try not to read all of them. So time for Q&A I'm happy to go back and think about any of these. So as I mentioned we did think well we weren't sure when the solicitation was going to drop from NSF and so one of the goals was to help the broader community start to think about what would a research coordination network approach be like for building capacity and standards around FAIR. The FAIR in open science research collaboration network program seeks to create three-year RCNs that will support a broad range of activities to advance the ways that investigators can share information ideas coordinate research activities foster synthesis and new collaborations and develop community standards all of that to advance science. So really this new solicitation is a way for NSF to begin to think about how to engage the community more broadly around some of these problems related to data software and open science implementation of FAIR or verification of materials in order to drive science forward. So I won't spend any time at all talking about FAIR we can do that again if people have questions when we get to Q&A but this slide is still important I think for organizing sometimes conversations that we have because it is a set of principles it is aspirational it's not an end goal the FAIR principles are aims and for all kinds of science and scholarship and research data services and related projects products really they sit along a continuum right so we don't reach an end result and just check a box it's really about the progress toward FAIR. So a little bit of the novelty here in innovation and our planning was that we've taken some lessons from the last couple years in Zoom meetings and conferences all of us I think have spent time sitting through full days of meetings this is very difficult we lose our our innovation and brain power after a couple of hours and so we said okay we know we need to be virtual we'll have multiple sessions across a few days we'll leave some thinking time in between our sessions were averaged about two hours we had plenary time and then discussion time that discussion time was facilitated and that was fundamentally important really to keep us focused on a set of questions we wanted to answer we used a set of modalities to gather input not only presentations and discussion and Q&A from the plenary sessions but we use Jamboard to really facilitate brainstorming and some theme building and work toward some operational aims and then we used mentee polls to to vote and rank some of those things to move towards some community consensus one of the most important things we had an advisory committee and they were brilliant at raining me in getting beyond my getting beyond scope what we were trying to do and also in the kinds of ways we wanted to engage the community around these questions we were we were asking the meeting was invitational and we did that because we wanted to make sure we had experts in the room from a wide variety of stakeholder groups and also across domains and organization types and one of the things that we did was make sure we had money built into the grant to pay the people that were working so you know there's there are a lot of hidden costs as we all know to all this work we do around stewardship and curation and we felt it was really important uh while while we weren't able to pay people salary levels of course providing honor area for participation in the delivery of the workshop was was quite important okay so what did we do in process we set the stage we organized what we were going to do at the start of each meeting we considered a range of existing models and initiatives across all four of the sessions we spent time discussing and identifying and prioritizing the aims or actions that we might take based on some of those things we heard and the organizing questions for that session and we'll be developing a scoping report so we're doing analysis now and some of the outputs that we're going to talk about in a minute some of the initial outcomes or ideas are just initial we're just starting the analysis now but one of the things we want to do is think about and the way we sort of framed all of this was what's achievable in the next five years and how do we also consider what's happening in the EU and the UK that might be most successful here in US context how might we adapt or adopt some of those things so here are the sessions that we ran and part of the reason we spent time thinking about international initiatives is quite frankly in terms of fair funding or funding around fair initiatives there's been a lot of funding in the EU there's a lot of activity there are nonprofit organizations there are new kinds of initiatives and novel efforts at institutions and NSF was really quite interested in hearing about where there's leadership there and how we might engage and collaborate but also bring some of those things successes here to the States the other thing we wanted to do in terms of domain sciences and think about here from them on successes for community initiatives was really to understand and have some conversation across groups that don't normally sit in the same meetings and so we had very different kinds of people representing you know from different kinds of projects and organizations representing very different kinds of science we met in advance with almost all of the plenary speakers and panelists the domain group we did as a panel in order to provide context for the whole of the workshop and also to suggest particular shaping for their talks based on their expertise this was really important to being able to get the aims met that we needed and to stimulate the conversations for discussion so some notable points out of the plenaries now some of this is not breaking new ground you've heard about a lot of these individual points in various meetings and reports but starting to put this into a coherent space to be able to think about setting a pathway forward and some aims that might be achievable at different levels of the fair implementation process is really important so on metadata you know so that one of the most important we had a couple of people talk about biomedical sciences and fundamental problem in that space where we'll need not only the biomedical science experts but we'll also need people from the library sciences and information sciences and semantics experts to be able to work together on these problems in that we simply are a long way from reconciling these challenges particularly around building knowledge graphs that have to account for the use of metadata standards that are designed for very different scales of analysis and how to reconcile these things one of these points here was that the anthologies very often focus on different granularities and we need to be able to build out relationships across these things to be able to link these objects together with respect to fair and open across the data life cycle there's very little work that's happening there or at least planning or scoping of work that lets us think about this across the whole life cycle of the data we talk about fair at the front end we talk about and what researchers need to do we talk about fair in terms of repositories and that's fantastic we have lots of conversations about interoperability probably not enough about reuse we need a lot more research there but we haven't figured out the ways that we need to move the data capture the scholarly record and move materials data objects software across the infrastructure that supports the stewardship life cycle and so we need processes for moving data from active use storage for example potentially to community manage kinds of places and then and then potentially over to archival and preservation services we don't have good disposal and data decision making around those processes and we don't have well established on ramps and off ramps we don't have the kinds of handshakes or negotiations and arrangements amongst the infrastructure to be able to move these these data along the way now we're starting to get some tooling in this space Prescuti for example is a project that was funded I think by IMLS and maybe also some by NSF and it's the example of a set of tools that support the transfer of research assets from summer repositories over to others it's a nice little suite of tools it does a lot of integrity checks that even is built in or spins out to some fair assessment but there are significant limits with respect to the size of of what the data files and projects they can move given that we're you know team science is just exploding mid-scale and large-scale science is really growing and we need to be able to have those sorts of tools and those arrangements and agreements to be able to move that for very large datasets so international models I'm not going to spend much time here I'll leave this slide I'll be posting these and of course I'm happy to answer questions later but these informed us in some potential ways we can think about organizing and bringing some new opportunities for collaboration just as a quick example the fair cookbook which was developed in the EU was a really big hit at the meeting and there's opportunities to collaborate with people that developed that cookbook which is really a how-to manual for planning and implementation and it's there's room to grow and they're looking for US collaborators to help build that out with respect to the domain initiatives we had there represented you can see people from digital agriculture all the way through we had somebody from the national magnetics laboratory which is funded by NSF it's a a set of three laboratories dark energy and dark matter researchers eco and environmental so we covered really a wide range of science disciplines or domains with very different kinds of data problems in terms of models of engagement and and some of the things they pointed out that really the bottom up efforts that we're all familiar with that happen at the community level that we can really engage and the research data management and libraries tend to drive some of these things as do some some of the professional societies the bottom up efforts can really be responsive to research needs but those and the community level work cannot accomplish everything we really need to think about also some top-down very large-scale consensus building an amplification of activities so you can imagine for example international global neuroscience organization starting to adapt the language of fair and really helping amplify the standards the data standards that are available to neuroscience researchers to be able to push those practices out and make data much more easily not only findable but reusable some of the things where we need to think about still and this won't be a surprise to those of you that know my work there there will always be disciplinary difference for which we need individualized or localized solutions for some disciplines for example the data are collected for very specific purposes there'll be a community of practice for whom those data are going to be readily reusable but the work and cost of translating that the representation of those data out for farther and farther dislocated communities or distinct communities away from the original community of practice is very difficult and and may not be may not be cost effective so I think we need to be thinking about how we organize and apply the work around fair really in ways they're going to make sense for the best data reuse we can provide also there's still a lot of proprietary work or instruments specific code and format so this came up in the dark matter and dark energy group and in fact there are many instruments and people still write their own code this makes data for reanalysis or reanalysis of data combination of data very very difficult this is a place where the community is really coming together developing a software to help solve some of these problems and finally in machine learning and AI based research fundamental problems with finding the code train models and training data okay so this is a quick example of some of our process I'm not going to spend any time on the slide I just want to give you a quick snapshot of sort of what this looked like as we were working along the way spent a lot of time sort of there was no meaning to the color coding here people just use different colored sticky notes but we started to look for ways that people were you know tallying up items that were important and getting votes so we did that in a couple of ways and this was one of them so a few more takeaways from the discussion session so here is an example of a question that we started out with and in fact this was the the first question from the domain space and and we wanted to understand sort of well where can we start you know what can we do that could have the broadest impact with the least amount of resources required now and interestingly the response was there is no more low-hanging fruit so really we're in a place now where we have to bring together multiple communities with different kinds of expertise to address the problems to move toward fair so that we're we're really in a place now where this is highly complex so we also identified one of the groups in particular identified and categorized a set of barriers I won't read these barriers this individually again these won't be of surprise to many of you but being being able to start to take these things and organize a roadmap if you will for how we move forward and set some aims over the next five years to really help the community move forward and inform the community as they're developing proposals and starting to put together the research coordination networks our scoping report will hopefully be very helpful for all of that and I'd say we have a need for a kind of a decadal plan in fact I our report will include that we need some sort of a plan that helps us think about particularly addressing some of the disjointedness of the policies and approaches across some of the funding agencies we need to start to think about how our plans and responses to that can can address some of those things okay so I'm at 21 minutes I'm going to stop and take some Q&A there's some good examples here that again my slides will be available I think the last points I want to make here are just some reminders that fair doesn't address quality you know that will live with the disciplines and domains and the scholarly communication system security is becoming increasingly important and should we be thinking about adding this into data management plans or at least planning it's not related to fair but I think we need to bring security into the conversation and we've been figuring out a long time about what goes where and for how long and we need to still spend a lot of time working on that particularly in the context of reuse okay I'll leave it there thank you very much here's the team I appreciate all of their help and the scoping report will be forthcoming and thank you hi um could you talk a little bit more about what you just mounted on the last slide about security being present in the DMP and are you saying that you're not seeing people address it in the backup section of DMPs or are you talking about like some other aspect of security yes I'm talking about some other aspects so um security and cyber security are now fundamental to the ways that we have to think about organizing where data goes how it's represented access controls and for a lot of scientists and as we heard in the talk yesterday there was a just wonderful panel on on security and privacy um we're at a place where I think we need to think about how to integrate overall security of the data content there's a lot of research that isn't that where security has it doesn't matter at all but there's a tremendous amount of work happening particularly you know it may not even fall under our IRB controlled or regulated data but data that's sensitive it may be that we have programmatic work happening we produce a lot of data we're not necessarily thinking about how we manage all of that so it may not fall under a DMP it may be part of some other data management realm within the university but I think we need to start thinking about how it fits into these planning processes whether it's the DMP whether it's some other cyber security or sorry cyber infrastructure collaboration plan that happens and I know it's a little it's sort of orthogonal to what we were talking about with fare but it's time for it's time for us to start to think about how to integrate all of that yeah thank you thank you for a really helpful overview of this um I have a sort of a trivial question and then a real question the trivial question is I'm really looking forward to reading this report um do you have any sense of the time frame when it's going to be likely available yes um we have a full draft a aim for first week of May for the full draft we're going to get some comment on our first full draft and then I hope it's out by June 1st if not before okay thank you I'll I'll be sure to try and share that out to the senior announcement list when it's ready um my real question at least in some of the discussions I've heard in mostly European circles about fare and how how to implement it they there's been a fairly rapid recognition that it has to be dealt with on a disciplinary or even really almost sub disciplinary um uh basis um where it's kind of easy to come up with standards for certain series of data sets that are widely shared or reused but um the more general you make it the harder it gets and I'm wondering how much of that you've heard in this workshop um about um generality versus very you know small data sharing communities yes we did so particularly in the the domain session um so in just a really quick example so that's that's a great question Cliff and a quick example um is that around findability we want to move towards standards of data citation excuse me um but for some communities the practice of writing out a typical sort of reference that goes in the reference list is simply not the way that community produces their work or that the scientists use those references within their reuse of the publications and therefore just for example in neurosciences data citations are generally found within the body of the paper so we need to do two things yes we want data citation but we need to think about the format of the data citation that fits differently in different parts of the paper and then we need the tooling to be able to discover those uh those citations or references within the body of the paper if it's not in the reference list so so that you know that's a very specific example with respect to the scholar communication space and findability and and data citation um with respect to data itself uh these these places are uh the examples I gave I think there were two in there um that I can expound on one is in the environmental science uh uh earth sciences data where they do field research they're collecting very specific kinds of of granular level data for specific questions it's very difficult uh to think about standards for representing those data beyond uh the immediate you know use that that it was designed for um that's very difficult to do and uh and in the machine learning space and this is where I think as we move to toward new digital platforms so the um emerald lab example we heard from yesterday was fantastic because there's they are producing a tremendous amount of very standardized data but there and then there's a set of processes that have to happen as those data move from the instruments out to other platforms and then beyond um again those kinds of things will be specific to the data generated by the specific group or for the specific instrument just as an example so yes thank you okay well I know it's lunchtime so appreciate all being here it's good to see you thank you very much