 Ray Yusushin is the Director of Library Collections and Digital Services at Texas State University Library and talking to us about digital scholarship ecosystems for open science. Thank you, Ray. Thank you, Melanie. First of all, I just want to thank the organizers of the conference. It's been really interesting in these last two days just to see the plurality of viewpoints and things going on both with data and open science. My name is Ray Yusushin and I am Director of Collections and Digital Services for Texas State University Libraries and I'm going to be talking about digital scholarship ecosystems for open science and a system that we've developed pretty successfully here at Texas State in the last five years. So what is a digital scholarship ecosystem for open science? Well, the way we define it is it's a network of several software components to enable research faculty and graduate students and with the purpose of raising research profiles. The simple larger idea here is co-locating open source digital components in a network research ecosystem enables the larger connections and network effects, raising the value for the researcher. What are the general characteristics of this digital scholarly system that I'm describing? Well, one, open source software, two, active developer communities, and three, customizable components. So this is a system suitable for any university or research institution that is looking to build this type of system. This digital scholarship ecosystem consists of six open source software components, two primary, a research data depository and a digital collections repository, and four tertiary components, an electronic thesis and dissertation management system, an identity management system, open academic journal software, and user interface content management software. Together, these digital ecosystem components enable the academic research cycle. On abstract levels, this has to do with the quality assurance and dissemination of knowledge. And on pragmatic levels, this has to do with the discovery and gathering and analysis of research and the writing and publishing and sharing an impact of research. So the number one component for open science is, of course, the research data repository. For this, we use Harvard's Dataverse, and we have configured this as a platform for publishing and archiving Texas State's university research data. A Dataverse allows you to publish, track data, discovery, reuse other data, and explore the Dataverse through a search mechanism. The second equally important primary component here is the digital collections repository. And we use the original MIT product, DeSpace. This organizes, centralizes, and makes accessible research and knowledge generated by the institution's research community. Essentially, the research faculty and graduate students. And I was really happy to see all of the talk in terms of preprints, because that's one of the areas DeSpace is very good at. In terms of tertiary components, we use OJS, Virio, Omica, and Orchid, and I'll go through those. OJS is open access academic journal software for refereed online journal creation. Virio is our electronic thesis and dissertation management system, and this bridges the graduate school review process and connects the collection and data repositories. Omica is our user interface and content management software, and this provides a gateway for complex research projects. And finally, Orchid is our research identity management system, giving researchers a unique Orchid ID to disambiguate names and allow aggregation of research profiles. Now, together, these research ecosystem components really open amazing possibilities for digital scholarship and collaboration, ranging from simple levels to creating digital libraries to more complex levels in terms of creating online exhibits and online academic journals to very complex levels in terms of putting all of the pieces together and creating complex multimedia archives and more complex cognitive cartographies. We've done a lot of assessment in the last five years of this system, and the results have been very impressive in terms of the annual usage growth downloads number of items and the growth of the system. These measures have been both quantitative and qualitative in terms of qualitative to our academic or faculty and students have been very happy with the system, especially the graduate students for getting their research out there. And with that, I will begin to conclude my lightning talk by giving you some further references to a deeper dive into a paper written about this system that was presented earlier this year and a set of working examples of the site that you could also click through in terms of seeing exactly how it works. And if you are interested in implementing this type of system, I have put further links here to all the open source software and downloads discussed. And again, this is all completely open source, so there is not a vendor involved. I would be glad to take questions and comments. And if you don't have any right now, I'd also be happy if you contact me to answer questions there. Thank you again for the talk in attending my presentation. Thank you, Ray. We have a minute or two for questions. Does anybody have a question for Ray? You can please send me a message or you can raise your hand. Okay, I do have a question actually from one of my colleagues in the library, and we're curious about how big of a team does it take to implement something like this? It's really impressive. Well, that's a great question. In terms of the core system, it's not a difficult system to implement. It takes two to start with. And so to start this kind of system, you're going to need a programmer and a system that doubles as a system administrator and a digital collections librarian. And so the larger idea there is that the digital collections librarian will take up the marketing, the front end sort of helping, shepherding faculty and graduate students. And then the systems person will mount the servers with the software and then set up all of your configuration. As the system begins to grow and it becomes too much to manage for a one person, a digital collections librarian, you'll expand the system. And so right now we've got about I have to say about 12 people working on the system because there's just been a lot of demand for it. And then there's been a lot of interest in everything from the data repository to the digital collections repository. But it's a great question. It's great to hear that there's so much interest. Okay, thank you, Ray. And again, if anybody thinks of any other questions for Ray, please feel free to message me or put them in the Slack. Our next speaker is Rosalyn Metz. Rosalyn is the associate dean of library technology and digital strategies. And I believe, yes, at Emory University Libraries. And she's going to talk about how to store digital info in a structured, transparent and predictable manner. Hi, Rosalyn. Thank you for joining us. Thank you. So I'm here to talk about the Oxford Common File Layout or OCFL. As Melanie noted, my name is Rosalyn Metz and I am the associate dean for library technology and digital strategies at Emory. So the Oxford Common File Layout or OCFL is an open community effort to define an application independent way of storing version data for long term access. So for those of you who have repositories or maintain applications, you know that making sure that you have content available for the long term can be difficult. This is our lovely logo, which of course shows the hard drive coming out of the rubble of the end of the world, because you know that's really what a lot of times repositories are looking for is how do we keep things forever. And OCFL has a number of things that make it stand out from other ways of packaging up data. The first is completeness, and that is so that an application can be rebuilt from the files it stores. OCFL was built around storing data and metadata together so that it can fall in line with standards around preserving access to data. It also storing the data and the metadata together allows for ease of mapping from one system to another with the ability to keep a picture to the past. The other thing that OCFL, the specification talks about is parsibility. So the idea around the specification was to build something that both humans and machines could understand to ensure data can be understood without any software around. So in disaster recovery situations, humans should be able to understand the data without an application. And machines should be able to read data just by overlaying the data with an application. And we have a number of examples where simple applications can just be overlaid on top of files, and then the simple application can just understand the files and what's in an OCFL object. You can see here off to the side what we call the manifest file, and that outlines what sort of files are included in a particular OCFL objects. And off to the other side, you can actually see what a storage root might look like. So all of the files that might sit in a storage root under various files and folders. OCFL, the specification was also built to ensure robustness against errors, corruption, and migration between storage technologies. And one of the big driving factors for OCFL was actually migration between different applications. So libraries in particular were struggling with being able to migrate repository data between various applications. So I was using d-space, and now I want to use Samvera. That migration is actually really, really difficult to move just between one repository technology to another. So OCFL was built to help make it easier to migrate between applications from version A to version B of an application. There's all kinds of different scenarios in which you might want to migrate your data. Strong fixity is built in. And so that's assuming that fixity is the idea that files are fixed in place, that we know what they are, that they haven't changed when you move them from storage location A to storage location B. Data can easily be validated using the inventory.json file, which is what I showed you in the previous slide. And also the data can be completely self-contained. So we make no attempt to try to tell you how to structure your data. You structure it however we want, and we just put a lightweight wrapper around it. The biggest benefit of the OCFL specification is it actually outlines a method for versioning data. And that's so applications can make changes to data while allowing the history of the data to persist over time. So changes to data are tracked, like I said, over time. We use forward delta versioning, which is similar to what GitHub does. We don't make multiple copies of data when we create new versions, which can sometimes make storing things really, really complicated. The previous versions can be reconstructed using those inventory.json files. So you get to see the full history of all the different changes that you've made. The next thing that OCFL, the specification, talks about is storage diversity. So oftentimes, folks are thinking about just one storage environment, the one that they've built their application in. Storage diversity also ensures that data can be stored on a variety of different infrastructures from AWS to Azure to pick your local storage options. And the specification can also be implemented in a way that ensures deduplication of data doesn't happen. Again, reducing costs when you're in something like AWS or Azure. So we have a number of institutions that are investing in OCFL, Johns Hopkins, Cornell, Penn State, University of Wisconsin, Madison, Stanford, University of Technology in Sydney, and then Brown. And they've developed a bunch of lightweight tools that can access an OCFL storage hierarchy. And then there are a number of systems as well. The University of Technology in Sydney has a research data portal that uses it. There is another archive out in Australia that uses it for endangered cultures, collecting data on endangered cultures. The University of Cologne uses it. And Lyrisis is using it for its new Fedora repository. If you're interested, here's some information, places that you can go to find out more about how to use OCFL. And with that, I'll say thank you. Great. Thank you, Rosalind. Does anybody have any questions? If so, you could throw up a hand or send me a message. Okay. We do have a question coming in. And the question is, what do you most need from the open science community right now in terms of folks getting involved? Well, you know, the idea really is I mentioned that migration of content from point A to point B. We'd really love some use cases. We know that data sets in the open science community can be huge and difficult to share. And in future version of OCFL, we're looking to see if there are ways for us to track data sets that never actually live in a repository. So we'd love to hear about really large data sets that folks are using or have access to. What are the ways that they're tracking them, trying to connect them to repositories, those sorts of things. And if there are open science communities that have repositories, are interested in adopting OCFL, we'd love to talk to you about that. Great. Thank you. So with that, it doesn't look like we have any more questions. But again, if anybody thinks of anything else, just please feel free to put it in a slack or also could go to gather town next. We're going to have a 20 minute break. And then we will reconvene at four o'clock Eastern time for our last session of talks. Again, thank you so much, Rosalind.