 All right, so yes, let's go and begin. So hi, everyone, thank you for taking time out of your day of your busy schedules to be with us. We're going to be discussing a collaborative project between the Center for Open Science and Internet Archive to automatically preserve preregistered study designs and outcomes in a format that's easy for institutions and libraries to pull, collect and share. But before we get to that, we wanna do a little bit of housekeeping. We will have a Q&A section at the end of this presentation. You're welcome to type out any questions that you may have in the Q&A box, i.e. the chat down below and Zoom, and we'll be happy to answer them at the end. You can also raise your hand during the Q&A time as well with the little raise hand feature that's also at the bottom of Zoom. Just if you do it in one of those formats, we will be able to see it and we will answer the questions at the end. So with that, let's go ahead and get started. If it will let me get started. It's like technology is slagging. Would you mind and go ahead and clicking on that slide for me, Peggy? Of course. Thank you. All right, so to go ahead and begin, we will be sharing about a project and work, but we also want to first learn from you. So we do have an initial question for this group and that question is what disciplines does your institution support? So we can get everyone to answer that question real quick and then we will continue. And if you do not support a discipline, then what kind of research do you conduct or what kind of research are you interested in? All right, we're gonna give it another 10 seconds. All right, I'm gonna end poll. All right, fantastic. So it seems that 20% supports arts and humanities, 40% of us support engineering, 20% information and library sciences and 20% social and behavioral sciences, which is great. We have generalization across most of the disciplines which I find exciting. All right, so with that, we're gonna go on to the next one. If I can connect to the next slide. I'm sorry, Peggy, you're gonna have to be right here. I'll do it for you, no worries. Thank you. So first we're gonna go with introductions and the first one we're gonna be talking about is the Center for Open Science. So founded in 2013, the Center for Open Science are what we call COS, our cause, is a nonprofit technology and advocacy organization with a mission to increase the openness, integrity and the reproducibility of research. COS requires evidence to encourage change, provide incentives and training to embrace that change and create infrastructure to enable those changes. So to that end, COS is developed and maintains the open science framework or what we call OSF, which is a suite of cloud-based applications which enables and supports rigorous reproducible science by providing collaboration, registration and data management support across the entire research lifecycle from where you initiate that question of what if all the way to the end and presenting that saying, here's my paper with my answer. So with that, I want to introduce some of our key team members who are gonna be speaking today, first one being me. Hi, I'm Mark. I am what you call a product owner for Center for Open Science. So what I do is I listen to all the stakeholders, users and our engineering team to figure out what are the needs and goals and challenges and what do we need to develop to accomplish those? We're also hearing to open science best practices. So if you ever have a question, if you ever have a thought or an idea, shoot me an email and let's chat. And with that, I'm going to pitch it over to John so we can introduce himself. Hi, I'm John Walls. I am a software engineer at COS, working on the backend team. We own the data models and APIs for the OSF and our team was the one that was heavily involved with building out our side of this integration. So we're really excited to share it with you. All right, and to go ahead and wrap this up, I'm curious, do you have an OSF account? So we're gonna initiate a poll to get you to answer that question. And like before, we'll give you about 20 seconds to be able to respond. All right, it looks like most of us does have an OSF account which is fantastic, about 83% of us does and 17% does not. So I'll be interested to hear about what you think about OSF and how it may best fit your goals and needs. With that, I'm going to switch it over to Peggy. Great, thanks so much, Mark. Oh, sorry, now I am not, don't have control over my screen. Let me see if I can get it back. All right, it's working. So I'm Peggy. I work in the web archiving and data services team at the internet archive as a product manager. And I'd like to share a little bit more about internet archive and who we are. Internet archive is a nonprofit digital library that was founded in 1996 with the motto universal access to all knowledge. Our founder, Brewster Kale, set out to build the library of Alexandra on the web and we preserve and provide public access to all kinds of digital materials including websites, apps, games, music, images and books. And this year we topped over 70 petabytes of data. We are perhaps best known for the wayback machine. It's the largest publicly available web archive in existence with over 550,000 users per day. We're also responsible for Archivit with more than 900 partnering institutions archiving web-based content around the world. And more closely related to our topic today we also work with researchers, policy makers and academic institutions to harness web archival data for digital humanities research. And with that, I'd love to also share that my colleague Brian Mubble is joining us today as well and I'll let him introduce himself. Hi, I'm Brian. I'm a software developer here at the Internet Archive. I work on preserving, crawling and cataloging scholarly materials on the web of various media types and content types. And I also work supporting researchers working primarily with our large web archive collection. So doing extractions or helping create subsets of content for various researchers. Great, thanks so much. So we also wanted to give some context into the history of this project and the larger goals. Here at Internet Archive, we're proud to work with Center for Open Science on this project and it's a collaborative two-year national leadership project grant from the Institute of Museums and Library Services and the National Digital Infrastructures and Initiatives category. Both of our organizations are passionate about providing open infrastructure as supportive research. And although researchers rate discoverability of data as one of their main concerns and many journals require authors publish their data, there's quite a fragmented ecosystem of platforms that's lacking technical integration and coordination with libraries. So since this is such a barrier for research for reproducibility and in piece the valuable work of libraries to do, they're valuable work ensuring ongoing access to the knowledge produced by their institutions. Our goal is to provide long-term access and connection to research data. Right now that's through registrations and also to harness costs, expertise in supporting, promoting open science practices and enhancing research reproducibility that way. As of August, I has preserved over 80,000 registrations. This allows critical research data to be archived outside of cost as open science framework system and ensures perpetual storage and access. And with that, I'm going to hand off to Mark to share more about OSF registrations. Thank you. But to begin that conversation, we have another poll, which is have you personally or helped submit a registration? And this could be done either on the OSF or through a different organization. So again, we'll give you 20 seconds to be able to respond. All right, it looks like most of us has submitted a registration, but some of us has not. It's pretty much a two to one split, which I think is fascinating. So that is great. And we're gonna actually going to talk about registrations at a really high level first and then we're gonna dig into the details. So with that, Piggy, if you don't mind going on to the next slide, I would appreciate it. All right. So what are registrations? Registrations are a frozen timestamped snapshot of a research project and all of the files and content that go with it. Pre-registrations are a small subset, which is it's the same thing, but it's done before the results are known or data is collected and analyzed. The goal of a registration is transparency. When a researcher pre-registers their research, they're simply specifying their research plan in advance of the study and submitting it to a registry or a collection of those registrations. Submitting a research plan increases the credibility of results because essentially what you're saying is here's the question, here's the data I'm going to collect and here's how I'm going to be able to analyze it and it's all right then and there. So therefore, if you do make a change, people know about what that change was made and why. It also allows researchers to claim ideas early in that research process. It increases research rigor because you have to answer a bunch of questions and think things thoroughly instead of ad hoc throughout the study. So it helps you do a little bit of pre-planning and how you're going to anticipate anomalies or potential anomalies. And then lastly, it also minimizes the pressures of things such as p-hacking or harking, which we notice sometimes has been embedded within the recent culture. So OSF hosts and maintains what we call OSF registries. We currently have 10 registration templates available that you see on your screen. The first one, OSF pre-registration is the most thorough and most commonly used. It has the most questions and it really helps researchers, especially new researchers, think through all the questions that they should consider before conducting a study. The second one, the open-ended registration is our most flexible and second commonly used. What it includes is a description of the study and then you drop in a file. Right now we only have these 10 and we're always expanding the templates that are available, but it's really challenging to do that for all the different study designs across all different disciplines. So this allows you to submit a file in a way so that way you can still register your study and present it in a format that makes sense for that study design, for that discipline and for your needs. So if we go into the next slide, which you can see the graph that's about to be showed. I love technology, sometimes it takes a second. A delay. So what this graph illustrates is that since the release of OSF registries in 2012, the number of cemented registrations has grown at an increasing rate and we expect that graph to continue. We do not have the results for 2021, but I can tell you from my preliminary analysis that it's already surpassed 2020, which is what we expect. One thing to note is this does not include what we call community operator registries or core and I will talk about those here in a quick second. All right, so if we go into the next slide, the American Psychological Association also reported that the number of registered report journals has increased exponentially over the past years as well. So they are correlated. If you are unfamiliar with these types of journals, they are journals that review and accepts an article, a tentative article for a research study before the data is analyzed and results are known. What this does is this minimizes publication biases and other potential negative effects experienced by some peer review or some of the other processes that are in place in our current culture. So this in the previous slide suggests that preregistration is becoming increasingly normalized within the scientific community specifically in psychology and we expect that to generalize to other disciplines. All right, so if we go into the next slide and we chat about yep, community operator registries. So what are they? They are registries led by organizations, disciplines and different communities. So essentially they are a specific type of registry and that community organization sets their own standards and creates a customized registration template that are catered to their specific needs and then they moderate them for their own specific rigor. Each discipline has its own needs, its own set of standards. And so they need to have registries and communities to moderate them. Core and how we have an OSF is they're branded which means they have their own different types of coloration, they have their own images, they have their own icons and they're all set by the organization that builds them and moderates them. We currently have five community operator registries and whenever our registry is submitted to that core it is moderated and those that are approved and those that are public because we do have public and private or embargo registrations, the ones that are approved in public are archived into our internet archive and the pipeline we're about to present. All right, and with that, I will pitch it over to my friend, John. See if I, we'll see if I can get the remote control to work for me or if we'll do some backseat driving here. Yay, works. Hey, it was a bit of a delay but we'll see what we can do. Hey, okay, everything has gotten synced. So Mark talked about how we have our community operated registries. We also have the OSF registries which is anybody can submit a research project that they're working on publicly through this without working through some other organization. We don't have any moderation standards. So it's very much up to the community and the people working on the project to own their quality but that's kind of the beauty and the point of registrations, right? It's there publicly so that folks can come in and say, oh, is your research looking reasonable? Is your plan good? You can see that you can filter by different registries over on this side and it's gonna be fun seeing when this lags and when it doesn't, but we'll make it through. Okay, as long as I'm interacting with it regularly it seems good and there is search as well. So COVID was a major pre-print servers blew up registrations blew up like we need to get information about this out as quickly as we can. We need to make sure that studies are rigorous as we learned very early on with some dubious findings that came out. So you can search by keyword and all of these things and let's take a look at what a registration looks like. So really we are trying to encourage this idea of pre-registration. So front and center we have these questions for the template that Mark described that lay out what a given study looks like. How are you collecting your data? How are you anonymizing the results? What are your variables? What is your sampling plan? What is your analysis plan? All of the things that happen in this space. And okay, got control back over on this right hand side you can see there is a whole bunch of what we refer to as metadata about the registration. The contributors or authors who worked on it user provided description of what the research project is. You can see the template that was filled out the date that the registration was submitted. Some information about tags, license, a DOI identifier which for every public registration we also met the DOI. So metadata all along that right panel. And then on the left hand side you can see some information relevant to the OSF projects that the registration is snapshotting. Even if you submit a registration without an OSF project one is created in the background so that you can continue to use the OSF project to organize your research materials. And let's click through on the files link here. So files are a big part of this whole archiving piece. The OSF supports file storage through providers outside of our ecosystem. You can attach your Google Drive, your Dropbox, various other sources. But when you register a project with information and those sources we download all of it. We put it in OSF storage so that we know that it is static and immutable. You can see what the data looks like at the time that the project was registered. OSF projects are hierarchical. You can nest them to organize your data in a way that makes sense to you to kind of give different people different levels of access to it depending on how large the scope of the project is. So if you look at this files page you will see files from all of the child projects as well. So from the top level view of the registration you can see everything for all of the components which is really handy. You can also go and look at what the components of the project are and how it was organized if you want to get additional information. Will that back button work? Looks like I can't click it. Okay. So yes, if you want to see more about the structure of the project hierarchy that was registered you can look at the components and see all of them. And then you can click through. They will share the same responses. There is one set of responses for the pre-registration template for the entire registered project. But you can see information that's specific. You can, so every project supports this idea like a wiki in other places where collaboration can occur. And the whole value of this thing is that everything all the collaboration, all the notes are archived and timestamped if you are registering from an existing project. We also just added the ability to update responses if you come through and you haven't started your analysis yet and you decide you're changing your plans. So this is a brand new workflow. I am not part of this project so I can't do it. But you would be able to come through here and begin a process of updating. That update would need all the same approvals that the original registration did. All of the admin contributors on the project would need to go through and say, yes that is a valid change to this thing. If it's submitted to a core, the moderators on the core would need to come through and approve it as well. And we are probably at about time but as we said last year one of our big efforts was creating this workflow where you do not need a project to create a registration. You can just add new. No, I'm not working from the project and you can just get going from there. And with that it's probably time to kick it back to Peggy. Let her have control of her own computer again. We'll see if I can get back in control. I will stop explicitly and maybe that will help you. That helped. Thank you so much. Let me move on to the next slide. Oh, we're in another view. Okay, here we are. So OSF registrations in IA. So I'm going to look at the same registration that John pinpointed but this is gonna be on the archive side. So let's see if everybody can see this all right. So here it is an archive.org. You'll see the title. You'll see the authors. Let's say if you clicked on the first author, you would be able to, well, let's let it load a little bit, but you'll be able to see all the different registrations or other items that are associated with this author. Let's go back. You'll see the publication date as well as the description which is actually taken from the body of the OSF registration. And then one thing I wanna note is you can see children. So these are the other elements that they have registered that's affiliated with this work. So if we were to click on one, just for you all to look at it and letting it load. Oh, loaded. Okay, so there's a survey here that was affiliated with this has its own individual page. And then other elements to note too are there's the DOI from OSF here as well as other categories that are specific to OSF that map to how they were categorized in that system such as OSF registration schema, OSF subjects and OSF tags. On the right, you have two options for downloading and then further down, you'll also see the registrations affiliated collections. It's in three collections. And I think this is a good chance, a good time for us to go through here to view how the collection page looks. I'm gonna go to OSF registrations. All right, so you see the collection description on top. Oh, and it actually took me, took us directly to as you see there's like a bout and a collection. So if you go into the about tab, there's the larger description, some more what we call collection level metadata. And you'll see some stats on the right hand side which may be of interest to you. Shout out to Brian Neubel who was also listed in here. So there's, you can see how, you know, there's a, you can see items coming in, top regions in the last 30 days which might be of interest to some of you as well. And now if we go into collection page, I wanna share some of the features for filtering that are built already built in. These are on, oops, I lost my, oh, there it is. So on the, now there is a search function. So let's say COVID, so you'll see 3000 results. And then you can see a further breakdown down here by date and year. And there's also collections that it's associated with. Right here, there's actually a couple of registries. So if we wanted to go into the character lab registry, you'll be able to see those items. And also I wanted to note that there was, you could also actually see some creators if this was something that you wanted to do. So I go back and on the top right, you'll also see some options for sharing in social media, favoriting this in your archive.org account if you have one. And this play feature is actually mainly for, run through individual items like a slideshow, which is usually for visual artifacts. But yeah, this is, I guess, very cursory overview. We do welcome you to go in and explore. And also we would be interested in any feedback people have on how it's appearing in the archive. We also have a metadata poll that we'd like to share with everyone. Give everyone 20 seconds. And we'll open the discussion later on if others, we just kind of chose a few because Zoom limited us, but we're definitely eager to hear more. All right, so it looks like most people were interested in subjects and DOI, but looking forward to hearing more takes as we go along. And one other thing we wanted to note is that the Intuit Archive also has an advanced search features, which allows you to be able to, there's some fields already here, but also some custom fields for you to be able to filter and search, and there's also date and date ranges. One thing that's really nice as well is that you can export in JSON, RCSV and a bunch of different formats, something that we also encourage people to use as a tool as they're looking for all sorts of things inside the archive, including the cost registrations. So to go into the next slide, we also wanted to just talk really briefly about the other sort of research-based work that we're doing, preservation, harnessing the rich research potential of the web is something IA has been involved on in many fronts. One of the exciting new additions for us is Internet Archive Scholar, which is actually spearheaded by our very own Brian Neubald in this call. Through IA Scholar, users can search over 25 million research papers and other scholarly documents preserved in the Internet Archive. We also welcome you to go in there and look around, it's exciting stuff. And then going, oops, was a little bit too, there we go. Sorry, it's delayed on my screen. So Internet Archive also supports those engaging in web archival research for social science and humanities research, particularly like humanities researchers, local national libraries, academic institutions and cultural organizations. These services harness Internet Archive's computing infrastructure to build custom data sets for partnering scholars, researchers and policymakers. One project to highlight is our partnership with University of Minnesota and Duke University. This project examined the health of local news media by capturing local news sites and using the what data format for early analysis. You can also review the data in their public collection, just as an example of some researchers who have shared directly in the archive. Then lastly, we wanted to share about documentation. Documentation is a place where you'll be able to learn a lot more about the work that we've done on this project, just to give you a bird's eye view. This is also appropriately housed inside of an OSF project. Within here, you'll be able to see more information on the history of the project, about open science framework, Internet Archive registrations and what's available in them is broken down very clearly. And you'll also be able to look through a couple of different options there as well. One thing we did want to share specifically is that at the end of this documentation, there is a how to section. This is a place for folks to be able to explore about how they could use some of the items that they find in their collections for bulk export, bulk exploration. Sorry, it's taking a little bit of time to load and there's a little bit of delay. So I appreciate folks as patients as I try to navigate down. Okay, we're getting there. So amongst these sort of options, you'll see that there are sort of recipes or suggestions on how best to filter content, how to work with command line, which is something that we use for a lot of the Internet Archive. There are also places where you can look into searching for specific areas in bulk. And these are generally suggestions, provide a general guide when using, we would ask everyone to feel free to look through, substitute on the, and focus on areas that are of particular interest to you and your organization. And we're going to be sharing these links and this documentation with everyone following this webinar. So with that, we wanted to give a chance for folks to ask any questions they might have. We've kind of given you a large overview of things but we're ready to learn more from you and to drill in on anything that you wanted to learn more about. Everybody on our panel, including the engineers who are working on this project are available for that as well. Are there any questions? And feel free to raise your hand. We also have some questions for you too. So we can share that as well. Okay, well, maybe we'll start with some questions. We're very curious about how this work, are these registrations both on the OSF side and also on the Internet Archive side might be of interest to you and your institution? What brings you here? And is there anything in particular that drew you to this webinar? And please feel free to use the chat. That's a quiet group. It's a bit early. I also had, actually I had a question for Brian. Brian, I felt like I wanted to do the documentation justice, especially the how to section. You had a couple of great examples in there. Do you have any advice for folks who wanted to, especially those who are not as familiar with some of command line or some of the things that you have in there to get started? Yeah, I mean, we have a couple examples in there using like kind of unique style command line tools, which is not maybe not a set of tools that everyone's most familiar or comfortable with, but it can be powerful for working with large, kind of like super human numbers of individual items. I guess my usual advice with this kind of content is to first just get a little intimate with it. So like try to find some specific individual examples you're curious about and you can browse through on the cost side and find the corresponding item on the archive.org side and download them and extract them, which you can do. They're just zip files and things like that. So you can open things up with your regular desktop operating system and poke around with it before trying to jump in and doing automation. And then we always have, I mean, either any of us individually in this kind of webinar or in the context of this project, but also the archive generally has support and contact for trying to adapt to specific tools or workflows or things like that. So there's a number of options for transferring and analyzing content. Great, thank you so much. That's super helpful. Any other thoughts from if anybody is interested that are open to sharing more? Oh, Mark, you had a question. I do. So this project has been going on for two years. And as we all know, things kind of timelines have changed and evolved. And there's a lot of progress within two years. So kind of rolling that back, this is a question for Johnny and I think also Brian. What sparked this project? What motivated for, what got those gears going and that ball rolling? I can go. I can try to bite it off first. My memory of discussing kind of the early days of this project were around kind of twofold. From the archive side, we're interested in trying to get more different types of content. And in particular, as an organization, we often focus on this kind of infrastructure of collecting content and preserving in the long run. And that's really what we kind of like optimize for is keeping our costs down. We own and operate all of our own infrastructure. And we have a lot of focus on just kind of this like nuts and bolts engineering side of storing content. And we recognize that we're not, we haven't always been the strongest on partnering with people and getting content into the archive and making it more accessible. So for us, this is like very exciting to work directly with external organizations, with content. So that's a little bit of the motivation from the archive side. I think from the cost side, from my early discussions was around sustainability over time, in particular, the cost of cloud storage and bandwidth in particular. So I don't wanna go into too many of the details, but I think the cost mostly works on a model where you pay, when content is uploaded, it's stored on cloud storage services. And when people download it, it's downloaded using cloud egress bandwidth. And both of those costs money on a monthly basis. And if you try to project forward over the next 10 years or 50 years, like the cost of that can will just grow over time as there's more and more content and more and more, potentially more and more downloads. So the cost doesn't look great. It becomes difficult to keep up with all that. And we at the archive somewhat uniquely have a model where we try to, we focus around trying to fundraise and have a sustainability plan so we can kind of estimate costs upfront and so we can take content in and compute our like what will this cost to provide access to for the next 100 years. And so it just seemed like kind of a natural partnership between the two organizations that we can try to take some of the long-term costs, like the kind of like projected costs over the next 100 years of storing and providing access to content in a way that we can kind of do the budgeting upfront today as opposed to trying to project it all the way forward. Anyways, maybe I'm rambling here a little bit, but it just seemed like a somewhat natural partnership between the two organizations. And then I would also say like to foreshadow a little bit, like we started with the registrations because they're maybe the most immutable of this kind of content. Like the entire point is to make something of a commitment. It is possible to do embargoes. It is possible to make changes and it's possible to yank things down if there's like a problem that's accounted all the way. But in general, the norm would be that these aren't getting updated, you know, every month or there's not a continuous new provision. So there's not as much of a need to continuously synchronize between the two systems. But we're certainly interested going forward to accept other content like entire projects or preprints or data sets and things like that. So that's like a little bit of a future-looking statement. So I don't know, don't hold me completely accountable for that, but it seemed like a good place to start with this kind of problem. Yeah, and just to kind of follow that up from the OSF side, we have a lot of storage providers and a lot of people who handle storage and things, but what makes IA different is that they're ideologically aligned for making open access and making things totally public and entirely free for archival purposes. So they really fit perfectly into the work and to the research lifecycle for us and kind of fit perfectly into that kind of archival final step in that lifecycle where it's gonna be out there forever. So it was kind of something that we do a lot, which is deal with storage backends, but something that we have a little special interest in just because of our missions being so aligned and fitting into that part of the work cycle. So there's a little bit of what we typically do and a little extra stuff on there. So I feel like we always, we're looking for something to do right as soon as we were formed as an organization in terms of doing something with IA and it's taken us to this time to kind of cement this relationship to the point where we have a real project that we can kind of expand and reap the rewards of. So many people have seen stuff on IA already, according to the metrics and we have thousands of registrations, so it's good to see it finally kind of fruition. Thank you. I think this was a monumental step towards the transparency and preservation of research and being able to spearhead that and making sure that research not only continues but grows exponentially. I think this is a great step towards that as we all solve with COVID and the need to just having that open science to open research readily available. So thank you both. All right, thank you so much everyone. If there are no other questions, I think we'll give everyone back 14 minutes in their hours. If we will share documentation and slides with everyone. Additionally, Mark graciously offered to answer any further questions you might have or information you'd like to have about this project. I'm going to drop Mark's email in this chat but also we'll send it through with the email the follow-up as well. But Mark is available directly at mark at COS.io. I appreciate everyone for taking time with us this morning and please do explore the registrations and please do share if you have any feedback with us. And one last thing, if you ever do want to revisit this video, we will be hosting this on YouTube. So that will also be included in the email as well. Thank you so much Mark.