 So I'm Sean Davis. I'm at the National Cancer Institute at the National Institutes of Health. I'm a PhD oncologist but for the last 15 years or so I've been a biomedical data scientist and one of my research interests has been in data reuse and building communities to enhance data reuse. So it's a pleasure really to introduce this session where we're going to be talking to some of the world's best at building communities around data and data reuse. I'm not going to spend any time introducing folks. What I thought we would do is to go through, you know, four to five minutes of introduction for each panelist and then after that we'll be doing questions and answers. So please feel free to keep the questions coming. We'll try to mine them as best we can, but we'll also try to spend some time with open discussion. All right. So with that we'll kick off with Keith Webster from Carding Bellen University. Then we'll move on to Alison Speck from the University of Queensland, Australia, then Lucy LeWang who we've heard from already from the Allen Institute, and then finally Ross Epstein from State Graph. Okay, Sean, thank you. Can I just get a thumbs up that you can see my slides? Yeah, great. Okay, good afternoon everyone and thank you for being part of this important event. We are always thrilled to collaborate with our colleagues across CMU and indeed across the planet in these events. I wanted to tee up this final session by saying a bit at a high level about community building. I don't have a particular project to share except perhaps my career long vision about the role of libraries at the heart of scientific progress. And anyone who's been looking at the news over the past few months will of course have been tracking headlines about the various crises brought about by the COVID-19 pandemic. And it doesn't take too much searching on Google to find lots of headlines about the potential impact on higher education and on research. And we could spend a lot of time feeling very perturbed by the crisis mode in which we find ourselves. We were just talking before we started about China and I very definitely slotted in a slide about China showing the characters for the word in Chinese or the Chinese characters for the word crisis. And John Kennedy played around with that a little bit talking about a crisis, being aware of the danger, recognizing the opportunity because apocryphally the two characters that make up the word crisis represent danger and opportunity. Although Wikipedia that source of all things truthful tells me that in fact the second character is not opportunity but a change point. And I actually think that in the current context that is a much more significant point for us. Because as you can see on the slide over the past few months, my colleague tells me that you're still seeing my title slide. So let me go back to the Chinese one just so you can see that hopefully you can see now the Chinese characters. Can I show maybe give me a thumbs up? Yeah, great. Okay, so this notion of crisis and the characters for danger and opportunity, but I had moved from there to think about that change point and what we have seen over the past few months about how COVID-19 has driven digital adoption in manners that are completely unprecedented. Companies like Amazon who were well established and other newer entrants in the place talking about 10 years growth of e-commerce deliveries in eight weeks, Disney launching a service and grabbing audience share in five months that took Netflix seven years to achieve. And I don't think we should pretend that these changes and behaviors are confined to our domestic lives. We need to think about what this might mean as we look to the big picture around data sharing, open science in the research community. Peter Drucker warned us many years ago that at times of turbulence, we shouldn't worry about the turbulence, but we should make sure that we don't act in the way we did yesterday, but think about how we might behave in the future. In libraries we need to, and we have been already of course, get beyond the notion that the library is the building full of books and journals. And our interplay in the research world has changed dramatically in recent years. And this is tremendously important playing back to one of the earlier talks this afternoon about the volume of data and publications that we have seen generated during the course of the pandemic. We heard this morning from Lily and Bo representing dimensions and they list 160,000 publications this year on COVID-19. A couple of thousand data sets, six and a half thousand clinical trials and so on. This world doesn't fit into the world of the black and white library. And as we think about the post-pandemic future, we need to strike a balance between what worked before and what needs to happen if libraries and librarians are to continue to be valued partners in the research process. Those of you who have seen me talk in the past might have seen me use this slide before I use it quite often to represent a model in which the library's traditional focus on research outcomes has been expanded by the digital transformation to allow us to recognize, capture, curate, share the artifacts and products of the research process, the activities that take place before the formal end of research writing up of outcomes, as well as the products of the aftermath of the research process. That poises us beautifully for a future world in which AI will dominate. Sundar Pichai from Google Alphabet made this great point in the teams we were building a world that is mobile first. In the 20s, we will be shifting to a world that is AI first. And as we've seen and heard today from a number of speakers, that is absolutely where we're at. And as we think about the potential of AI, we need to recognize the importance of sharing data to power the acceleration and the development of AI technologies. And all of that, and this is a very subtle plug for tomorrow's event. Please come back tomorrow. Open science is a critical part of that process because that fosters the culture and community of data and publication sharing that makes all of this possible. I was on the call a few months ago with some leading administrators in this country who had been surprised by the resilience and preparedness of the academic community to share data and publications during the pandemic. And they recognize that there is no going back. We have entered a world now where data sharing and open science are the norm. And therefore, it's important that we recognize inside our communities the sorts of work that my colleagues at CMU have been doing as an illustration, those worlds of open science and data collaborations inside an institution. But also to recognize that we are but one part of a global community of research that is generating and is ready to share data to advance the world's research to power the AI work of the 2020s. And what we need to do is build the infrastructure and the communities so that those who are the experts in the disciplinary domains have the infrastructure ready to hand. I'm reminded of something that came out 20 years ago an ad from Hewitt Packard that said that the internet needs is an old fashioned librarian and time has moved on. But I think that sentiment maybe is right that we in libraries are potentially at the heart of this community of data sharing. Maybe not this kind of library, but there is a library out there that is waiting for the next wave of community building and we are ready to help. I am going to hand over to the next speaker. I'm happy then to discuss with my colleagues as we move into open discussion. Back to you, Sean. Thanks, Keith. Let's switch over now to Allison and maybe she can start with telling us about her background. Literally. Literally talking about the background. Yes, I suppose that is a very good start to the context of my talk, which is as an environmental scientist. So I've been an academic and a researcher for many years and my hair doesn't quite give me away yet, but I'm working on being brave enough to go gray. But anyway, yes. So I was an environmental scientist engaged in long-term monitoring in large continental wide database collation from 1870 right through to today of ecosystem plot data. And this particular site here is one of my personal field sites, which is there's the wonderful Pacific Ocean just above my head. That's the dunes. It's a coastal dune situation. And this is a high barrier, what is called a barrier dune swamp, which I monitored for 20 years for water abstraction for the local human community and for mining. So very contentious areas. And it's one of those reasons, one of the reasons why I'm particularly passionate about the need to conserve data and also to particularly save it in a way that it can be reused. And I started in the 1980s. We started monitoring this in 1989, 8889. And I'm just monitoring, we're just redoing a paper at the moment of just three surveys. In this case, the case of this recent paper is a qualitative survey of human beings and their opinions. And I must say the amount of comparable data in supposedly replicated surveys is the minority of the data that was collected. So, you know, I suppose when you've got inanimate things, I know it's got nothing to do with the subject. It's got everything to do with the discipline and the rigor. So I think I'll stop that and I will share my slides and I will do it for hopefully the full screen. So can you see is that filling in your whole entire screen and your world or not? I've just stopped it. I'll fill your entire screen. Okay, so one of the other things about my slide is that I've got a logo behind me, which is the Terrestrial Ecosystem Research Network, which was established in 2009, which is why I left my academic job because I believe so passionate and my tenure and my superannuation because I believe so passionately in the importance of community data, conservation, repositories and enablers. And I've kind of been away and I've come back to the Terrestrial Ecosystem Research Network so what I've got here are some interesting logos on this slide that describe where I'm coming from and I'm not going to talk about a particular project and Sean I'm going to have to watch the time because there it is at the time there because I don't have terribly many slides and I could talk an awful lot. Okay, so at the moment I'm working in a couple of major projects. One is the Terrestrial Ecosystem Research Network. I'm really an ecosystem research analyst for that so I dip in and dip out and make commentary and wise pronouncements. But it is great to be back to help this thing that I think is so important. The Terrestrial Ecosystem Research Network is an observatory very similar to Neon, which you can see in my lovely yin and yang diagram there, which was I created for the Palisette website about an article we did with Neon with the Adolescence of Linguistics on Elephant in the Room. It's a sort of a blog from Germany and I was particularly interested in okay so build it will they come as one of my friends says build a data repository build an observatory who is actually going to use it and how do you engage and the particular topic that we were talking about in that blog was Synthesis Centres. So I left my job as an academic to set up the first Synthesis Centre in the Southern Hemisphere the one in Australia called ACS the Australian Centre for Ecological Analysis and Synthesis which has since stopped but the whole point was to use that as an engagement for this big observatory which was serving up supposedly lots of wonderful machine generated on the most part data about the environment. The Atlas of Living Australia is a consortium of museums and in fact I was data one is a bit of a morph between some of these discovery organisms like the Atlas of Living Australia and the organisms the organisms that generate their own data like Neon or to a large extent like term. So I was very happily involved with data one that you will see there and if you don't know much about data one I'll talk a little bit more about that in the future. But the Synthesis Centres are a very interesting way to and have been to help the average scientists from that so I came who you know goes round up to their knees and swamps ruins your toenails by the way or in rainforests or whatever and gets them to engage from their quite often quite onerous work that absorbs them to actually thinking about the future and sharing that sort of thing. Of course on the way to being involved with turn I was involved with the research I became involved with the research data reliance which is a quite inspiring thing for environmental scientists to belong to and to visit because you if you have a little periphery I know I must learn more about data the RDA is like an inject of blood transfusion of people who think about that sort of thing. And laterally I've been involved in a project funded by the Belmont Forum which was a one-off initiative on the science enabled infrastructure and how you can do that and we've got a particular project on three of which I'm very proud which is called Parsec and I won't talk any more about the acronym but it's a wonderful acronym and isn't about a wonderful physical molecular idea. Okay so I suppose just to re-emphasize I'm particularly involved with the community. One of my graphic designer relatives generated for me those lovely little logos on the left hand side of the screen I think it's the left hand side for you to explain at a semantic web conference in Austria and everyone was there it's one of the keynote speakers I believe she's been here today which is just wonderful. To kind of show the continuing gulf in many ways if you want to be a good scientist you've really got to engross yourself in what you're doing and the difficulty is how to reach that gap and yes I think I pick up on what Keith said yes things are vastly improving but it's actually interesting one of the projects that we are continuing to do with data one with time of phase two of the ten years of the data one existence has shown and this is letting the cat out of the bag a bit that in fact there was a tremendous burst of enthusiasm in the community up until 2015 and since then it's kind of sat so it isn't a matter of building and we're pondering upon why this is the case that in fact maybe the researchers thought yeah this is a good idea just like me wonderful now well I'm still busy doing my research and I'm not going to do much more after that and maybe the infrastructure and the deliverables that will enable the people who generate the data that is being shared is not there yet it hasn't matured enough yet so we've I've got a bit into the data maturity pathway but not for all these organisations but for individuals and how you facilitate that and they're exploring it in policy um you'll note that with data one there's a little picture of me there one of the ambitions of data one it was a member node sort of structure a collaborative participatory structure to serve up information and data and I was very pleased to have been instrumental for the Australian connection but of course like many such connections um they don't always last forever okay Alison sorry to interrupt um time yeah do you do you mind if we sort of move along and we're going to try we'll we'll try to share out these um these URLs and the slides of course so uh so should I just proceed with the next two which one belong and shall I stop do you mind do you mind if we go ahead and move on yes that's absolutely fine all right I'm sorry we just didn't give ourselves enough time here so we'll stop sharing there we go so you don't see all right there's food I think you've got the general picture yes there's a lot to talk about clearly uh Lucy do you want to go ahead we've already heard from you today so yeah absolutely so I think I can just take a few minutes to talk more broadly about um my role and my interest in open science so uh again I'm I'm Lucy I'm a young male skater at the Allen Institute for AI which is a non-profit research center based in Seattle Washington it's founded by the lake Paul Allen and the mission of the Institute is sort of to um apply AI to uh various aspects of research and especially to support researchers by releasing a lot of open source tools and methods uh from things like computer vision to natural language processing uh and data resources so specifically I work on the semantics scholar team within uh AI too which is what we call um our institute and uh the semantics scholar is kind of like a publicly available literature search engine so it attempts to make scientific literature more discoverable for uh users for um for people who are interested in learning more about these subjects and we work with publishers and other community groups to gather that data and kind of make it available so um the thing that I care about here is making sure that open access materials uh in the scientific literature are widely available so that people can conduct research using these materials so not all papers are open access I think a lot of us have our privilege to work in universities or other systems that have lots of subscriptions to journals but not all of us do so there are many out there who are kind of limited in the research that the that is kind of available to them even though a lot of this research is funded through um like public or governmental funding so uh it kind of in this um in the span of trying to make open access materials more available uh I think we uh I talked briefly about this in my earlier talk but STORC is one of the projects that we've released it's the semantics scholar open research corpus where we make 12.5 million uh open access papers readily available to the public in a structured machine readable way um and uh chord 19 is kind of a particular use case less chord 19 is the COVID-19 open research data set which we released earlier this year um in which I talked about earlier so maybe quickly I can discuss some of our learnings from releasing the chord 19 data set uh and kind of some of the ways that maybe we would do things differently if we uh had a chance to do this again so so for a little bit of background we were um asked to release the chord 19 data set by the white house office of science and technology policy and we were given only six days to turn out the first version of the data set so in some ways we released this open data set but there were lots of things that would maybe wrong with the earlier versions uh that we had to kind of incrementally correct but all of that is now in the data record right like people have used versions of this data set that have um that have these issues uh so I guess one thing that um uh definitely can't emphasize enough uh but when releasing data and systems it's really important to maybe document as early as possible and at the very beginning we tried to build a community around the data set using something it's an open source tool called discourse which is a forum and chat uh tool which connects people who are maybe interested in working the same problems and um if they're for example annotating the data they can share those annotations um and try to reduce effort so this course worked fairly well at the beginning when there was tons of energy around chord 19 but over time people um kind of like there there were questions that were being answered in discourse but weren't super discoverable on the data set release page and things like that so um there there's a lot of documentation that we ended up having to distribute many times through many channels and not in a very centralized location so it's very very important to do that and for a lot of the systems using chord 19 and other similar data sets um they actually post-process the data a lot but it's very hard to find this information and on the the the sites of these systems so this leads to the problem of like poor reproducibility someone can come along and not really know what you did with the data or how um a system maybe arrives at a like an answer to a question um so that that information is really important to surface as well and then the um other thing but maybe the last thing I can talk about here before turning it over Ross um uh earlier in session three I think Imran uh mentioned some issues around licensing of these data sets so bringing bringing this back to open access considerations so something that um that we really had to think hard about for chord 19 is making sure that we provide appropriate provenance and licensing information for the documents in this data set so the data set was originally supposed to only release we're essentially trying to release the full text information for open access papers but this information can be a little bit unclear so for example there can be a preprint of a paper that's subsequently published in a non-open access journal so we really want to choose the version of the paper that provides maybe the most open access rights to it um and uh so this this was a bit of a challenge in creating the data set as well and you know licensing is very hard something I worry about a lot uh that's been happening lately is the proliferation of these non-creative commons licenses in the scientific literature for chord 19 uh for like COVID-19 literature and I think maybe there are things that we could do in this space to prevent that from being um kind of the common thing uh going forward run you back Sean thanks Lucy um Ross I heard a few of things a few things said there that sound like they might be uh might uh sort of ring true with you guys too so yeah especially yeah totally especially as it pertains to licensing uh I'm I'm one of the I think I'm the only one on this panel that is coming from the commercial side of the house as we speak I actually found some slides that I can walk through also so we can we can jump into a few slide things too um I'm not normally much of a slide guy but let's let's throw in let's throw it together anyways so um let me let me give some context just just for my background to start so my name is Ross Epstein I'm the Chief of Staff at Comey called Save Graph um and we've spent and I have spent a disproportionate amount of time since March uh giving away as much data as possible um which has been very fun and also very challenging uh and giving it away to as many academics as possible which has been great and hopefully this is uh hopefully this will be fun and you guys can tell me what we've done right and wrong and a bunch of good and a bunch of good stuff so um let me just give some background on Save Graph just to start uh we Save Graph is just a data company so all we do is we provide data sets about places uh the joke I like to make is all we do is we sell expensive CSV files that's it we don't build applications we don't build models we like to say that we power the innovators so when we talk about access to the academic community as well as access to anybody who is working to fight against working to do research for COVID-based purposes right this this rung true to the ethos that that is Save Graph um when uh the products again focus around places they funnel themselves into three core data sets we've got a list of points of interest in the United States and some in Canada where consumers can spend money or they can spend time so things like the park the hospital the pizza shop the Starbucks are all sort of points of interest then we use geometry so we build the build build a polygon for the building footprint and we take what is uh the largest commercially available panel of mobile devices that uh and we use anonymized GPS pings to aggregate that to foot traffic data so as you can imagine aggregated foot traffic data two points of interest in the time of COVID is an exceptionally useful tool for people to help uh for for research purposes and we so historically we'd always had an academic program where it was very it was very small like if a couple if folks who had come over and asked for access we would happily give it but it wasn't something that we were pushing or promoting in any way shape or form we thought it was good for the ecosystem um and then realized that if we're gonna do it we might as well do it so that uh we spun up what we what at the time we called the the COVID-19 data consortium where we actually of course Save Graph was promoting and giving away our data sets for for any of these research purposes but we were actually doing a lot of the hard work to work with other partners and other commercial level data set providers to bring their data sets in so that there was sort of one centralized repo to do not of course not one centralized repo but a semi centralized repo where uh there were multiple different data sets that could be used for for research purposes so these existed across uh real estate data uh credit card transaction debit card transaction data things like payroll data so a bunch of really interesting data sets that could be used for for a myriad of different research purposes um and then of course more than just data was actually providing and I think many of that you are uh on this panel know this but data isn't the only thing it's also the community that's part of it right and making sure that you're not just providing data out into the ether so that uh right you need to make sure that there's documentation with it that there's support with it that you're that you're providing the resources and the utilities so that folks can actually do interesting things with that data it's more than just and yes good documentation is the start to that but we wish it was I think all of us wish that just good documentation would be enough to answer everybody's questions it's around how do you make sure that when people are coming with questions you can ask them um so that actually that that actually funneled into uh I'm laughing now but it's funneled into a slack group and so we have now up to 5600 individual uh researchers who are in all sitting in this like slack group together and Sean I'm joking because you're we've got a quote from you down there in the bottom left on this deck for summary and uh but it's but it's a but it's a great community of organizations and and individuals who are working together to to build a bunch of uh do a bunch of research as part of this there's just some of the partners that we've got ahead and been able to bring in so everything from other data providers to technology providers so that you can if you needed to throw all your stuff into a data bricks notebook you really have the ability to do that quickly and with some amount of free credits so that you could compute uh and you had to compute resources if you needed to um and what's awesome is that we've seen a huge amount of content come out of this right on the order of something like 100 academic research papers come out um with uh with some attribution to safe graph or the or the other uh sort of data providers that were part of the uh that that were part of this consortium so um I speak to this from very operational basis less of an academic and research focus um but it was it was really great the beyond just academia there was there's a whole lot of public sector based work that's going on in in in the consortium everything from the from the the federal government the agencies in which you would know down to the local municipal governments and counties who need to understand hey if I'm uh how are my uh how are my constituents in my community members listening to things like social distancing measures and are they going out into into their their local economies and are they spending money and are they visiting these types of things and and are we seeing too many people at the park these types of and these types of really interesting insights that can come about just from a local and community level um that's that's all that I've got from a content based perspective but um but uh yeah let me know if you I think Sean you're going to open it up to questions maybe now but uh yeah all right um lots of different perspectives um we could talk about any one of those points for probably a couple hours each um we do have a question uh that comes from uh one of our panelists actually about um and I'm going to I'm going to paraphrase a little bit um Allison is interested in the status of preprints and the quality of them and the data associated with them so I'm going to change twist that a little bit to say um how do you um in various communities that are data uh intensive where you're trying to provide data for you reuse how do you monitor and catalog and use that to change how you're approaching your community how do you monitor the the research the research output or the the output from those communities and I guess at least three of you have something probably to to offer on that topic so Ross you talked about it a little bit already yeah so I'll talk with it I'll tackle a couple things there one is um we we had made a and this was an interesting sort of tactic but we had made a decision that we wanted to actually we of course when you've done all this hard work for a paper you want to evangelize it and market it off so we actually offered up how how our sort of internal marketing and PR groups would help amplify the coverage uh of which them of which they may want to push something out right and so uh of course it's hard to continuously promote this type of stuff but uh especially when it was uh between April and June right we had a whole lot of press-based uh organism like major first-year press publications who were coming to us and were asking hey what are the interesting data what what do we see what are we actually seeing out there and instead of we really like as safe graph as the entity's safe graph who's creating these data sets we didn't have we didn't have answers right we have we create data sets but ultimately all the the academic community and the research community were the ones who were actually deriving insights as part of it so we were actually all we were doing we were middle middle manning and making and making introductions to major tier one press publications the york times the washington post um all these organizations were the wall street general were coming to safe graph and then we would just say hey talk to the talk to the academic community talk to this person who had a really great paper paper about political political partisanship and social distancing right uh and how that might and of course that's a very sort of buzz worthy and clickable sort of uh something for the for for the media but it was a really great way to to make sure that people were were promoting their work and their preprints early on because they could get very quick feedback uh from the market from the press from a lot of these different things it was it was a different tactic than I think uh than than what what some of the more academic communities might and what you guys might think about when it comes to pre-print based feedback but uh but that was one of the things that we found semi-interesting for us um is it okay if I chime in here uh so uh maybe I think that's that's a really uh great point Ross maybe I can speak to the opposite direction of coming from academia and seeing these preprints proliferate especially in the current COVID epidemic and um maybe wanting to push back a bit against the amount of certainly like press coverage that it is placed on these arguably unfinished and unreviewed works um so preprints like I think preprints are a complicated matter because they are also very different depending on what field you're used to working in so um I can speak about computer science and biology medicine biomedical domain because those are the ones I'm more familiar with in computer science preprints have been established for a very long time and they generally are considered slightly higher quality um because people tend to read them and immediately build up on top of the work that's in them I'd say in biomedicine preprints are relatively newer I'm not which doesn't necessarily have direct implications just because uh there's there's less common to use now um but the work that's being published in places or released in places like bio archive and med archive do tend to be more unfinished work and they are not reviewed um and although they are being included in data sets like port 19 or probably like safe and incorporated into safe graph um there's a sense that they haven't been subject to the rigors of peer review uh to make sure that the methods are reasonable that the results are reasonable based on whatever methods were applied and it's been somewhat problematic um I've seen a number of articles kind of published based on preprint results that maybe ended up being um not like like shown to be not quite as uh like not quite as strong or uh didn't really hold up when expanded to a larger data set or something like that um so this is a definitely a problem and another problem related to preprints that we're still trying to address is how to link preprints with an ultimately published version of the paper so like not all preprints reach that point in life of becoming peer reviewed and published but many do like for for many like a prefront is is work in progress um so that's uh that's also somewhat challenging um when you have multiple versions of essentially the same publications some of which uh like are altered a lot before they reach their kind of like final ultimate form well just just put a note in that uh we have Richard Siever the the co-founder of bioarchive and med archives speaking tomorrow so show up and hear hear his two cents worth as well sorry go ahead go ahead Keith maybe I could just throw in a a couple of points firstly I had shown the um data screen from dimensions which was aggregating the publications on COVID-19 and just from very crude arithmetic on my part tens of thousands of those papers are preprints and I think that illustrates the you know the the positive and the negative undoubtedly peer review is an important part of scientific communication and there is an element of trust between authors editors reviewers and readers um that peer review brings about on the other hand preprints I think have really flourished during the pandemic because they are a mechanism through which researchers are able to disseminate their research their findings their data quickly and excessively to allow others to build upon that early work to foster collaborations and communities sticking with the theme of this panel and I think that you know as long as there is a clear health warning that these preprints have been released without peer review without editorial scrutiny and acceptance that some findings might be unreliable and potentially dangerous for public health then the scientific community I think has the opportunity to treat preprints as an important part of the move to open science and perhaps to think about how open science can be built upon in a world where observations are being shared in real time during the experimental process never mind written up as a preprint before peer review and perhaps all of that ties together to say that the pandemic has had a very positive effect on collaborative research on a global scale where we're seeing institutions collaborate that have never worked together in the past and that rapid publication of ideas and findings supports not only the scientific community but the public interests and recognition of research at this time but coming back from a circle to show there's no easy answer and all of that has certainly led to some higher profile retractions those of us who've been in the scientific communication business for a long time recognize that retractions are part of scientific communication so I think it really is about accepting and encouraging that preprint sharing but treating with caution so I got there and there's always a danger of thinking out loud I mean arguably like it's much healthier for the community to have a conversation about a preprint discover that is problematic and have that like have the authors modify the methods or the results before publication rather than encouraging that these you know not all published papers are canon many ultimately end up being not I was going to say Allison I actually have a question sort of aimed at you all right maybe I can ask and then you can finish I think we're going to end up finishing up about now but I mean in terms of the last question so the last question was related to the the need for sustainability in the in an era where data sharing remains sort of constant but data consumption seems to be growing you know almost exponentially and for someone who's been working in this in the field of data sharing for 20 plus years I'd be curious to hear what your thoughts are on data sharing sustainability and data consumption yes um I'll try and compose a quick answer I have no great faith in sustainability I mean we've been experiencing a history of people setting up repositories and they're not being for example and to to share their data because their community is totally different or some other reason but hearing from some of the people involved in the core trust seal for example you know the lifespan of even the best repositories is often very limited and I was interested in the previous discussion there's a lot of reliance in this open data world on the salaried researchers who are able to have the freedom to share their thoughts and share their data and I think that's quite fragile and it needs concerted effort to maintain it I mean I've got a sweet special issue of a journal that referred to the reported work of 700 or so 300 or so people and that website has been taken down I've served up data on farm forestry and can sort him a collaboration for the carbon credits back in the 90s and the 20s and the university library as soon as I left took it down so I I think it's great what I hear I'd be interested about COVID how much that ripple effect goes beyond the medical community I'd be interested at how I mean this used to happen in conferences to some extent COVID has helped that but for all researchers it's lots of exchanges occurring remotely without large fees to attend a conference it's wonderful but and so maybe that will continue but yeah I just I do think there needs to be a real effort to understand that the infrastructure and the processes behind this do need active support I hope that's a reason I think it's pretty reasonable and very honest and yeah anybody else have anything to add there well I was going to ask so as as I probably that as sort of one of a newer entrant into this sort of community aspect to it I understand that that thing and and your concerns like very directly because it's it's it's hard to maintain how do you suggest sort of when when some commercial organization realizes that hey well we have an asset that we want to give to the broader good in some way shape or form and we need and we'll tie two of our topics together around licensing that Lucy was talking about where there are license restrictions and when you have a business around it how do you suggest you're sort of moving exceptionally fast and being able to get this out as fast as possible to sort of working with the sort of other operational based infrastructure that might exist currently and the ecosystems that are there right what do you what might you suggest to a new entrant well um thanks for that question um I think the one of the advantages in the kickoff of COVID I've just mentioned is that it has opened up in many ways a way for many more people to participate and collaborate and you could quite happily use that um collaboration that may not be more virtual to speed up and to create a new evaluation procedure we mentioned Keith has mentioned the the kind of the totally imperfect review process that the journals have been trying laboriously to um to employ um and reviewers are increasingly hard to find the idea of the prepent would before that there was something else where you put something up and you've got comments um was to get comments but you've actually got to get those comments the number of prepents I've seen there's an 18 month one up there at the moment that I want to refer to personally this has not had a single comment so you've got to create get that community to pay attention to this and to see it as a new a new way of trying to secure your research and to speed up and I think it's to speed up the outcomes of some of the good and to knock on the head some of the poorer research and I think you can do it the difficulty would be governance of that because we all know in academia the um the the cliques the clubs that feed off each other and um we'll reinforce their own line and so some governance and thought needs to be involved but I think um there is an opportunity for that um I am concerned about people like you Ross and thinking big share and there is other wonderful things that to be sustainable sometimes it involves a big a big organization to come and gobble you up if your idea is that good you'll retire a millionaire and go to the Bahamas um but and maybe your work will continue because you have that great idea um the analogy here is the might be brilliant industry in Australia to take something completely different and hire that version of it is will come and buy up the crash course um and you know it's only the the rare one who wants to live in a poor way that will make it's their idea sustainable but but on the other hand if you can make the big people uh adopt some of the practices and learn that can also be another pathway so there's two maybe I could just jump jump in with the thoughts there um you remind me Allison of a project that I co-founded 15 years ago it was the Australian partnership on sustainable repositories and we always had to be careful not to call it retainable repositories uh and the that was an early recognition as we started to build institutional and disciplinary repositories that it was pointless creating a bucket that we poured lots of stuff into without any sense of long-term responsibilities and resourcing and I think we still haven't cracked that one it's great for a research project to say thou shalt share thy data but who's going to pay for that in 10 years time in 50 years time are we going to relieve that to archive.org or are we going to have a joined up approach so I just wanted to to mention that I'm conscious of time but Sean. All right I am I apologize to the panelists for not allowing us to get it get it a lot of the meat and to the audience for for just getting scratching the surface here but I'd like to go ahead and kind of close the day thank you to the panelists we have lots of communication means to reach out to the panelists otherwise after this and if you have trouble reaching them feel free to ping one of us the organizers at this point feel free to head over to gather.town for the reception and then just a reminder that tomorrow we have the open science symposium that the instructions just came up in in the chat so thank you all so much and take care thank you very much she's great thank you thank you thanks good night