 Get this going. So, hi, I'm Nick Weber. I'm from the University of Washington and I'm also representing the qualitative data repository where I'm the technical director there. What I'm gonna talk about today is an initiative that we have called the Annotation for Transparent Inquiry and I'll kind of explain that as I go. But I wanted to thank Gardner in particular for the keynote. So I called a little bit of an audible and making my slides. I think there's an underlying thesis to the work that we're doing collectively here in this room but in particular in the ATI Initiative and that's that we sort of have this thesis that flourishing on the scholarly web is going to require collaborations that maximize our comparative advantage for collective good, right? And we can think about comparative advantage as this kind of basic economic principle that if we have a sort of exchange of goods which is open and free, if we maximize what we're best at doing, we can cooperatively create greater wealth. But I would say that right now the scholarly web is not necessarily creating greater wealth for everyone. That is to say that we have a increasing amount of pressure that is put onto individual researchers in order to conform with emerging norms around transparency, around replication, around verification that puts the onus or the responsibility almost entirely on those individuals. It doesn't necessarily distribute that to a set of institutions or a set of technologies that can assist them, all right? And so this is creating maybe gaps in wealth between different types of research or modes of scholarship. So I wanna also sort of highlight that calls for transparency are somewhat separate from replication or verification initiatives as well. The transparency initiative of the Political Science Association or the American Political Science Association defines transparency in sort of three modes. The first is production. You can think about that as the way that data are collected. The second is around analytics or how data are the sort of methods that are brought to the data. And then the third is data transparency which is access to the underlying empirical evidence itself. And so in the social sciences at least, quantitative transparency has a pretty good template that's maybe slower emerging than we would expect but it looks something like this. This is I think a really good example from Cambridge University Press. It's a set of researchers in political science research and methods. That's the journal. But what we see here is a typical kind of narrative HTML that we would expect but there's also an interactive environment provided by Code Ocean. And so we have a set of actors that are maximizing their comparative advantage, right? We have an executable environment from Code Ocean. We have the data that is backed up and archived in Dataverse. And we have a publisher who has provided this infrastructure and coordinated these different actors in order to have an end user experience that sort of transparency in their research, right? So this is the quantitative model. Transparency in the qualitative domain is lagging for a number of reasons. There's not the tradition of sharing because there's privacy and public, I'm sorry, information about individuals that are often embedded in the types of research that is done there. But there's also sort of not a lot of good existing models that don't do anything more than innovate with a traditional scholarly article. And that is to say that these are sort of bolt-on or supplemental materials to footnotes and notes, et cetera. One of the maybe kind of early progenitors of some of the work that I'm gonna talk about is active citation. This was this idea by Andy Moravchuk at Princeton who said, what we should be doing is providing a set of technologies that will allow people to explicate a particular citation that they have made, explaining why they're making the citation, providing if it's archival access to that resource in something like a transparency appendix, which appears at the end of an article. But this is costly, right? This requires publishers to innovate with and change the model that they typically publish in the other kind of scholarship, right? They're gonna have to make an exception for qualitative scholars. It also requires copyright to be mediated by the publishers themselves, the hosting of the data that may be linked there. So altogether, not a terribly efficient way to provide transparency for qualitative research. Okay, and this is, unfortunately, I guess we can't see this at the sort of big monitors, but I thought this was a great example of another solution. So this was a footnote that was tweeted out by a historian, Kara Shilling. She's, then I'll read the footnote to you. It says, the figure is from the New York Times article of the period and will probably show up if you search for keywords, barge captain, 1258 dead horses, or barren island in pro-quest historical newspapers. I've lost the bloody reference and I'm confessing my carouselness as to the world instead of spending my day looking for the exact piece. Perhaps you could just trust me on this one. If you can, then I thank you. What? Okay, so because Twitter's this beautiful networked world, then pro-quest responded and they actually found the article. Right, but this is incredibly inefficient, right? This is not how we would expect scholarship. This is not how we expect science to operate, right? And we can do better than this in the 21st century, right? We have the socio-technical means to develop a template for qualitative transparency, which is meeting the needs of production analytic and data transparency and is simultaneously compliant with we think of the data principles of fair, right? Findable, accessible, interoperable, and reusable. So the ATI initiative is very much about this and it has three partners. It's Cambridge University Plus, it's the qualitative data repository, which I'm here representing, and Hypothesis. And so what ATI was envisioned as doing is creating dynamic linking for original source material that support authorial claims. Integrating with a range of, or the ability to integrate with a range of scholarly publishers using web standards to anchor those linked materials to published texts. And then a scalable method of preserving underlying data and creating persistent identifiers to that linked content. Okay, so essentially what this means is that we got some funds from the Robert Wood Johnson Foundation in NSF to pay authors to go back and annotate their articles to maximize the transparency along three different lines, right? The data, the production and the analytic transparency behind them. So we picked 40 different authors and we split them into two groups. One was going to annotate publications that were already published. They existed in the scholarly record. These were anything from one to five years old. The other group of authors were still writing their articles. So they were in the process, the formative process of generating and thinking about how to craft a narrative and how to craft a supplemental material that would support that narrative. We also paired all 40 authors with a peer reviewer. The peer reviewer would first read the article unannotated and they would note expectations that they had around the transparency of the article, the claims made, the data sources they expected to see. And then we show them the annotated article that the author produced and ask them whether or not that lined up with their expectations and to overall comment on the quality whether it was improved of the transparency of the claims made in the article. Okay, so some observations preliminarily having run both of these workshops already. The median number of annotations was around 28. So this varied quite a bit between the kinds of different disciplines that were participating. We had anywhere from sociologists to epidemiologists to linguists to political scientists, historians, legal scholars, et cetera. It also differed pretty substantively whether you were annotating a pre-publication or a post-publication. Pre-publications tended to have more leniency. They could create and they had the ability to figure out what exactly they would annotate and why. The median time to assemble a project for ATI, meaning to collect the data to anchor it to the text and to explain the logic of why you were anchoring a piece of data to a piece of text, was around 40 hours. And this ranged anywhere from 12 to about 200 hours that individual spent just not writing the text. This is just going back to revisiting, annotating the text to make it more transparent. And pretty significantly, the time to review was about 12.3 hours for this is the annotated text that we provided to reviewers. And reviewers overall reported this being pretty significant when compared to the traditional methods that they would use to review an article. Okay, so this is what a compendium of an ATI looks like. This is the interface of a software application called Dataverse that QDR runs and we have a particular fork of. You can see that there's kind of, I guess you can't see, but there's a author description about the kind of logic of the transparency that they were trying to facilitate and then there's access to all of the raw data files here. Then what we have is the actual annotated article. So this was, I think, a really innovative use of the ATI protocol. So this was a linguist who was looking at Scottish Annunciation. And so what they did was they would interview someone, ask them with an in-group that is someone from their community to say a set of words and then they would interview someone, record them with an out-group, someone was external to their community. They then do sort of a differential, right, of how people pronounce things with an in-group versus how they pronounce things with an out-group. One of the kind of challenges of this type of comparative linguistic research is providing reliable access to the underlying files and allowing people to experience or hear those files while they're reading a manuscript. And so if you click on one of those, it will take you to QDR where you can listen to the recording of the individuals, but we also had played around with allowing people to just listen within the browser. So while that they were looking at the annotations and looking at the text, they could listen to that sort of linguistic deviation. All right, and so I think this sort of has an analog, right, to what we looked at is the comparative advantage between a quantitative template and a qualitative template. What's emerging is a set of actors who can maximize what they're good at. QDR can preserve the data. Hypothesis provides a set of technologies which allow us to link that data and Cambridge coordinates or organizes this and puts it onto the web, hopefully in a sustainable manner. Okay, so some outcomes of ATI is pretty wildly dependent upon really whom is asked. So reviewers reported saying that they asked better questions, but it required more work to kind of sift through and make sense of and interpret the primary resources. So one individual said, in particular, I have more doubts with more transparency. I would argue as a social scientist, that actually is a pretty good thing for science. All right, so publishers have new modes of engagement, potential for on-ramps and discovery of articles through increased links to the content, both from our repository, but also from Hypothesis Discovery Tools that are built on top of that. Authors sort of were very much split. So there's increased opportunity for the criticisms of the claims that they're making because they're opening themselves up to new lines of criticism. So this person in particular here says, I feel more vulnerable to be honest and I'm not sure what my motivation is for incurring even more criticism. This person went on to explain sort of the feedback that they got, they felt like substantively improved the manuscript that they had under review, but they were super nervous then about responding to what would have been a publication which may have had a reasonably easier review process. Had they not annotated the article? Again, as a social scientist and as someone that supporting a scholarly web that wants to be more reliable, wants to be more transparent, I would argue this is a really good thing for science. We're opening up new modes of criticism, but that should hopefully improve the veracity of the claims that we're trying to make. Okay, so I'll end very quickly by saying one of the other motivations of this is to do all of the linking of annotations, all of the production of data and facilitation of this transparency in a way that's sustainable, right? So as a sort of data repository, we want to be authoritative stewards of a scholarly record, but we also want to do that in ways that will sort of promote annotations as first class research objects. And to do that, we need them to be citable, we need them to be discoverable, and we need them to be shared and reused in sustainable ways. So what our team did at QDR, working with the Dataverse software, which is an open source package so anyone can develop and use an implementation of it, is we created, this is probably hard to see if you're not close to a monitor, but we created in our ingest form, this is what individual data depositors are contributing their data to QDR, this is what they fill out. The ability to link a URL or a group through hypothesis, that is we call the hypothesis API once they enter this, and when they do, what we pull down is a JSON file which has all of the annotations that that author made. And so that author's annotations then come into our repository and become discoverable because we index that JSON object. So that means that we can search against and at the content of annotations, and we also have the links which are embedded to the documents in those annotations. We also, as well as just providing the object and the ability to search against that object created an individual view, which is just the HTML, or yeah, an HTML rendering of that JSON, which is showing the annotations out of context. So straight on the Dataverse repository, you don't have to leave in order to see the annotations and what they consist of. What this also allows you to do though is you can't see the radio buttons at the top, but it allows you to then just by the press of a button go out and find that article in its annotated context where we'd have the rich kind of HTML version of the hypothesis overlay. Or you can just download the file or download a compendium of files that are associated to those set of annotations. Okay, so to sort of sum up or to kind of wrap up what the thesis of the talk was about, we're empowering scholarly communities to collaborate and be cooperative in maximizing their comparative advantage. Publishers organize peer review and mark up the manuscripts. Software nonprofits like hypothesis provide interoperable features by building upon W3C web specs. And data repositories like QDR get to provide persistent identifiers, get to curate that content, and hopefully provide long-term storage and preservation of the data that's underlying the claims made in these scholarly articles. We have a number of future directions and we'd be excited to work with any publishers, any scholars, any people that are doing sort of the social science data archiving in general. One of these that we're pretty excited about is that the articles that have been annotated already, that is they've been published and the author returned and annotated them, we're seeing a boost in the activity flow to those pages, right? So we're seeing people visit the pages more often, we're seeing people download those articles more often, we're seeing increased traffic to those landing pages in QDR. I would say this is a pretty good proof of concept that this can be a sustainable model. So thank you very much for your time. I'd also like to say thank you so much to the hypothesis developers that helped us figure out a number of sort of finicky challenges around how we provide access through closed restricted groups. It was a huge benefit to us that was really paid off for the last two months. So it's been a beautiful collaboration and we hope to sustain it over time. So if you have questions for Nick, it would be great. The reason that we use the mics is not everyone can hear equally well as you and also we're recording and it helps make the recording audio good. So look, Steele's walking toward the mic now, he's approaching, he's almost there. I didn't get that bad for time either. Thanks for the great presentation. I had a question actually, something that was just tangentially mentioned on your last slide. You described a bundling tool as a future direction that would allow people to upload an XML or a Word document file with annotations in it that would become a PDF with anchored annotations. That sounds kind of mind-blowing and I wanna know what you mean by that. Me too. So this is a proposal that we've gotten a bunch of traction on from, the idea of a developer that's on our team. The basic idea is that when you're annotating a manuscript as you're writing it, you're oftentimes leaving notes to yourself and you're attaching data files within the annotation capabilities of a Word document. So that's just an XML encoding, right? We should be able to pull that out to take the file that's associated with that and to take the content of that annotation. And then through some maybe magic, this yet to be seen as whether we can do this, I assume Dan and other developers in the room probably have ideas about why this would be really hard, but be able to try to continue to manage the anchor of that text, right? So whatever the annotation is anchored to, we would try to preserve. If nothing else, that just gives us a bundle that we can ingest and then ultimately curate and if we have to do the annotation manually, it's taking a few steps out of it for QDR. If we can do that somehow automatically, it would be an enormous benefit to scholars that are right now, I think, really struggling with the authoring process. Yeah, so to be determined, I guess is the short way of saying that. So hi, very nice presentation. I have two-part question. One is you mentioned, I think, at the very end that you noticed that data sets associated with articles that had annotations associated with them received more usage or page views. Do you have any data to disambiguate whether this is simply a data linking issue or whether the annotation itself brought more users somehow to? It's exceptionally preliminary. I would hold it as far away from any sort of reasonable solid claim as possible. It's something that we've noticed right in the two months that these things have been published. What's particularly interesting, though, is that some of these articles have existed in their published form for about five years, some even three years, are seeing now a spike in interest. So that spike in interest could be for lots of reasons. It could be just because they were part of this and there's more people going to that. Colleagues are excited so they're sharing a DOI to that article again. But maybe not, right? It's something that I think would keep our eye on and hopefully if people that do innovative alt metrics work would be interested in it. And the other part of the question is, what is it? Oh, if I understood correctly, you showed how people can basically upload JSON annotations to your data repository, right? And that is a way for you to provide discoverability of these annotations. Doesn't that speak, though, to a non-met need for search for annotations on a wider platform rather than, I mean, this seems to be like a hack that you wanted to do to see what, is that? Yeah, absolutely, it's a hack, right? It's dealing with the annotations that exist that QDR has curated and people have uploaded. A sort of more greedy search which would go across different groups and across different maybe tags or collections would be, I think, beneficial. That sort of global search is something that I think other people could probably build tools on top of and facilitate. But yeah, I think that's a really good idea. It's also something we've thought about whether or not we would wanna maintain it and do it ourselves.