 platform, not just for publishing, but for networked software. And this is the thing I remember from Andrew's talk. He had a slide. The slide had on it a UPS tracking Earl. And he just invited everyone to have a really good long look at that Earl and think about what it meant. He said, this is amazing. I kept saying that over and over. This is amazing. It's like every package on the web has its own homepage. And it wasn't just that the package had its own address on the web. It was that it named an instance of a business process. And the people who had access to that business process were UPS employees. They were also customers. But also, programs, machines had access to that process or to the data that was in it. So those of us who had started writing Pearl programs to scrape web data and repurpose it and reuse it, were also hitting that same kind of URL for different and complementary reasons. So as we all learned, URLs can point to all kinds of resources, right? Audio and video, documents of all kinds. But for a lot of us here in particular, it's the idea of pointing to text inside of a document. So words and sentences, cells and HTML tables. And this set of URL addressable resources has really two properties. It's infinite and it's interconnected. So 20 years later, we're still kind of figuring out what you can do in this infinite universe of interconnected web resources that you can manipulate with a few simple commands like get and post and delete. And when you're working in an infinitely large universe, it can seem kind of ungrateful to complain about the fact that it's too small. But actually, there is a much larger universe contained within that one. So let's sort of contemplate this URL and just think about what it really means. Well, I'll show you one view of what it means. A human being who follows that link is going to visit Roy Fielding's dissertation on web architecture and is going to scroll down to a selection of text inside of that document. And there, be able to assign tags, maybe have a discussion about that fragment of text. A program that accesses, well, not quite that URL, but a variant of it, is going to retrieve this resource. The URL that identifies that resource is not itself any kind of a standard, it's just a link that points to a web resources. But the resource itself is, since February of this year, a W3C standard. And the way I like to think about it is that the highlighted phrase in that document and every possible highlighted phrase has its own homepage on the web, a place where humans and machines can jointly focus attention. So if we think about the web we've known as a kind of a fabric woven together with links, the annotated web increases the thread count of that fabric. And so I've been working with my panelists here. I've had the privilege to spin out a series of applications that explore the implications of that idea. So I'm going to show you some examples and then I'm going to invite my collaborators who are Mike Caulfield, who you've met a couple times, maybe already. Anita Bandrowski with SideCrunch, Beth Rudi with AAAS, and Marianne Martone with UCSD and Hypothesis, to talk about what these apps are doing for them now and where we hope to be able to take them. So the first one up is Science in the Classroom. This is, I guess only a few people were there yesterday, so I'll just repeat a few things, is a AAAAS project that annotates a collection of primary literature from the science family of journals. Annotations are done, I think, mainly by graduate students for the benefit of younger students to help them understand the methods and outcomes of scientific research. So here is one of those annotated papers. There's a widget called the learning lens that toggles layers of annotation on and off. Here I've selected the glossary layer and I've clicked on the word distal to reveal the annotation that's attached to it. If we look behind the scenes, Hypothesis was used to annotate the word distal, but learning lens predated Science in the Classroom's use of hypothesis, and Science in the Classroom team wanted to keep using learning lens to display the annotations. What they didn't want was the workflow that was behind it, which, as I understand it, involved an extremely laborious process of, I think, even manually inserting content into HTML pages. So the idea was, use Hypothesis to create the annotations, and then use JavaScript in the Science in the Classroom pages to retrieve the annotations and write them into those pages using the same format that had been previously applied by the, I guess, webmaster. So the pre-existing and unmodified learning lens JavaScript code could do what it did before, which is pick up the annotations, assign color-coded highlights based on tags, and show the annotations when you click on the highlights. What made this possible was a JavaScript library that helps with heavy lifting required to attach an annotation to its intended target in the document. That library is part of the Hypothesis client, but it's also available as a separate standalone component, which is what we're using here. It's a nice example of how an ecosystem of open-source components can enable interoperable annotation services. And I was also gonna give a shout out to the guy that wrote that library because he was here this morning, but I don't see Randall. There he is. Randall, thank you for that. Okay, so we heard also this morning a bit about Stefan Kendea's European Investigative Collaborations, and my call fields work with student fact-checking. So this is the app that I've worked on with Mike to help his students marshal evidence and make sense of the web. Now these investigations are written in a wiki, and also displayed in a wiki. This is one of those examples, which is an investigation that I did myself in order to test out the process that we were developing, which required me to gather a whole lot of supporting evidence before I could even begin to analyze the claim. So I used a hypothesis tag to collect those annotations, and you can see them in this hypothesis view. Now I can be very diligent and disciplined about using tags this way, but it's a lot to ask of students or really almost anyone. So we created a tool that knows about the set of investigations that are underway in the wiki and offers the names of those pages as selectable tags. So here I've selected a piece of evidence for the investigation. I'm going to annotate it, not by using hypothesis directly, but instead by using a function in a separate digital extension, that function uses the core anchoring libraries to create annotations in the same way that hypothesis does. But it leads the user through an interstitial page that asks which investigation the annotation belongs to and assigns a corresponding tag to the annotation that it creates. Back in the wiki, the page embeds the same hypothesis view that we've already seen as a related annotations widget that's pinned to that particular tag. I ended up with so much evidence for this investigation that I needed another way to organize it and try to make sense of it. So came up with another widget that we are calling the timeline. It gathers a subset of the annotations and displays them this way, annotations that have been tagged with dates. So to put something onto the timeline, I use another one of these right click helpers. Select a date on a page and use a function that says add selected thing to timeline. Choose the tag which targets where that thing is going to attach. This is what the annotation looks like in hypothesis. So it has a tag that's the date. Over in the wiki, the JavaScript that runs in that page organizes that stuff on the timeline. Now, it turns out that publication dates aren't always evident on web pages. So sometimes you have to do some digging to find them. Now, I think one of the key points here is that when a student does add a date to a web page in this way, they haven't just populated a timeline on a particular investigation page. They've upgraded the web in some small way because anyone who visits that page with either hypothesis or any annotation or client is gonna have the benefit of that work that was done by the student and will have an indication of what was the date on that page. So that's true for all of these annotations that the students are creating. They're using them in a particular context, but those annotations are generally available to be reused by others in a variety of ways. So the last and actually most popular feature that we've added to this toolkit is called footnotes. So once you've gathered all your raw material into the related annotations bucket and maybe organized some of it on the timeline, you wanna weave the most pertinent references into the analysis you're writing. So to do that, you find the annotation that you gathered in our annotation viewer here that's embedded. Use a copy to clipboard function to copy the what we call direct link and wrap it around some text in the article. And now when you refresh the page, here's what you get. So there's really two things happening here. The link on the word posted is a straight up hypothesis direct link. So it does what a direct link does. It takes you to the page, scrolls you to the context, highlights the annotation and opens up and displays whatever is related to that annotation in the sidebar. But if you have 15 of these footnotes in a page, it can be kind of tedious to open them up into 15 tabs. So the other piece of this is that the link here, the in this case, superscript one is pointing just down into the page here. So there's a section in the page now that does what Ted Nelson would call transclude. And what Nate Angel for some reason hates hearing me say, the word transclude, but it's what it does. It transcludes the footnotes into the page. So now you have all of the context from those separate annotations that were marked up and directed at the page available directly within the page. And one last point about this kit and this is just an experiment so far but students really don't like the writing tools that are available in these open source wikis. And they're not wrong about that. They're really rough around the edges and hard to use. So we've been wondering if we can enable students to use Google Docs, launch out into it as the editor that they use and bring the results back into this page that is displaying the results of the investigation. And it looks like that's going to be possible and even it looks as though we'll be able to convey by way of these hypothesis direct links we'll be able to convey the footnotes still into the article, right? Cause we can just look in the export from Google Docs and find them in transclude the annotations. So it looks promising but we don't really know how this is going to work out but I will say that my colleague, Maryann Martone here, who both uses hypothesis to gather raw material for the research papers that she writes and often writes them in Google Docs is really hopeful for the outcome of this experiment because she would really love to be able to flow annotations through her writing tool and into published footnotes. So we'll see if we can make that happen. And that is the segue to a thing that we've worked on. Anita and Maryann both have worked on with me and Tom Gillespie who's here, a project called Cybot. So Anita in particular is someone who is, I would say, working to increase the thread count in the fabric of scientific literature. So when neuroscientists write up the methods used in their experiments, the ingredients often include highly specific well, for example, antibodies. That's the dominant kind of thing that we're talking about here. These are the ingredients of the scientific experiment and if you want to reproduce that experiment, you want to know exactly what ingredients were used. So these things have colloquial names and they have even vendor catalog numbers but those aren't unique identifiers. So the neuroscience information framework or NIF, where Maryann and Anita both work, has defined a namespace called RRID which stands for Research Resource Identifier. And they've built a registry for these RRIDs and they've convinced a growing number of scientific authors and journals to mention RRIDs in the text of articles. So here's an article with RRIDs in it. Now they're just written directly into the text, you literally write RRID colon and the identifier. And that's a very carefully chosen design because it turns out that the only real permanent artifact in the scientific record is this text. And it's also a searchable archive. So they want the stuff to be in the text of the papers but these are identifiers and they need to be correct. So if you're talking about polyclonal antibody, the idea is you look it up in the registry, you capture its ID, you write it into the text, and if it's not in the registry, you go and add it to the registry and make Anita very happy when you do that. So the first phase of this project, SciBot, was about validating these RRIDs because they're just text typed in by authors and so was the identifier spelled correctly? Did it actually point to a registry entry that resolves and brings back the record that you expected? So to find out, we built this tool called SciBot that automatically annotates occurrences of RRIDs. So here, Anita is about to click on a bookmarklet, labeled RRID, and what it's going to do is send the text of this article to a little backend service. That thing is going to scan the text, do some simple expression matching to find our RRIDs, look the RRIDs up in the registry, and then using the hypothesis API, programmatically create annotations in the paper that anchor to the instance of the RRID and when clicked in hypothesis, open up the sidebar and display in the sidebar the registry information that came back if it was found. And so here we see that the hypothesis real-time API has notified us that this backend service has done its work and there are three new annotations on the article and here they are, right? So now we have hypothesis links on these RRIDs. Now, that's just the first step, right? And kind of the key point here is that we're in this zone of partnership now between humans and machines. So some of these things might be misspelled or missing. In this case, here's one that was missing, right? So vglut2 should have been in the registry and it wasn't. So now that information has been added, not directly to the paper, but made available to a reader of the paper because it's been added in the annotation layer. So, you know, pretty much every automated process needs human creation and exception handling and that's what this hybrid of automatic entity recognition and interactive human creation gives you. These corrections are now available to train a next-gen entity recognizer so that iterating through that kind of feedback loop, we're gonna be able to, I hope, mine the implicit data that's in the scientific record and make it explicit. Here's the hypothesis dashboard for one of the SciBot curators. The Tag Cloud gives you a pretty good sense of how this process has been unfolding so far. And now, publishers have started to link to RRIDs as well. So here, this is a PubMed article and there's a link on the RRID Zerk ZL1. If you follow that link from PubMed, you'll land here in what NIF calls the SciCrunch registry and you'll see that there's a list of other publications that use this resource. So that's kind of what we do in the web that has existed for the last 20-some years, is we point to documents. And of course, now we can do better. So the main purpose of RRIDs, really, is to say, if I'm using Zerk ZL1, which happens to be a particular strain of zebrafish, I wanna know who else has experience with that tool, as they call it. And not just the experience that was reported in the paper, but what's been added to the paper in the experiences that have been reported in the annotation layer. So one of the ways that we could do that, we've already seen, we could do that footnote transclusion thing. I mean, I think it would be cool if publishers did that. But there's lots of other ways to skin the cat. So here is just a little tool. So you have a piece of text that is selected and it says, find other uses of this RRID. And all it's going to do is link to hypothesis query for that tag. And there are all the other instances where it's been tagged. And if we open one of those up and click on the direct link, you go to the page and you go to the context where the researcher actually used that thing. So the last piece of this, and then we'll get into a discussion about all of this, is gonna bring this full circle back to Andrew Shillman, who I mentioned at the beginning. And Andrew now works as a patent attorney. And we reconnected recently. And he said, there's this tool that I use in my work. It's called a claim chart. And what a claim chart is for a patent lawyer is a two column table where column one is a set of selections from the patent. And in particular, it identifies the claims in the patent, right? And they literally call them claim one, claim two. And in the second column of the table, there are statements in other documents that bear on the claims. And so as you can imagine, in a legal tradition, there's a really strong need to be able to bring these things together and reason about them, right? Does this statement in the other document support or the claim that's being made in the patent, right? And nowadays people build these documents laboriously in Word or however they do it. And so Andrew said, you know, it seems to me that maybe annotation would be a way that we could do this better. And I guess if there's anything I've learned about annotation in a couple of years that hypothesis is if somebody says, can you do X with annotation? The answer should always be, I don't know. Seems like you could. Let's try it and find out. So this is an annotation powered claim chart that we've come up with. It's just sort of a first iteration of this. But this is the layout. The daggers in the top left of each cell are hypothesis direct links. The one in column one, right? Those are enumerating the claims. So that's going to take you to the patent and to the particular claim that's being asserted. There's a hypothesis annotation on that. The second one is going to take you to another document where another statement has been selected that relates to the first statement. And the toolkit does it this way. It says, when I select the claim, I tell it I'm adding a selection as a claim. I go through a process of identifying the claim. In the other case, I tell the toolkit I'm adding it as being related to a claim. And here, we bring these two contexts together in a single view, right? So now I don't just have to remember or revisit the tab that had a patent in it with the annotations about the claims, right? Because I can see it's not just claim one. It's, oh yes, claim one, a monoclonal antibody. I'm now associating the sentence. It is a fragment of the monoclonal antibody from another page with this one. So once you've identified the claims in this way, they're available as targets of annotations in other documents and it's easy to connect the two statements. So the last time I read Vannevar Bush's famous essay, As We May Sank, this is the quote that really struck me. He runs through an encyclopedia, finds an interesting article, leaves it projected. Elsewhere in history, he finds another pertinent item and ties the two together, right? Ties the two together. So when statements in documents become addressable resources on the web, we can start to weave them together in I think exactly the way that Vannevar Bush imagined. So with that, everyone with whom I have collaborated on the things that I've just discussed is here with me and I would just like to hear from them about what's working, what this has enabled, what's broken and where they think this can go next and why. I'm Beth. Because of the hypothesis extension, you've reduced our workflow by a ridiculous amount. There was a lot of copying and pasting. I won't talk too much about the workflow because I don't wanna mess up my talk for tomorrow which is gonna be invigorating and wonderful. But essentially we went from having a Drupal 7 widget and doing a lot of copy pasting and having it kind of hard code things in HTML. Now we've got a much more automated system. There have definitely been some bumps on the road which I think is inevitable when you kind of have a custom extension and you're working on something together to try to figure out the best way to make it work. I know that for us, the most important thing is to have something that's reliable. So the less the number of steps in the process of getting from the contributors annotations to pulling it in through our learning lens is kind of the number one thing for us because if there's so many moving parts and there's like one thing that goes wrong, then basically you just have a research paper that's reposted online with no extra additional information. I could just keep talking but I'll stop now. Anybody else? So for us, one of the coolest things that I really like to do and I don't know why but it's just really satisfying. It's probably an OCD thing. But I actually really like to open up a paper and run SciBot and have it actually come back with the entity. I don't know why, that makes me so happy but it really, really makes me happy. When it doesn't come back with annotations then I email Tom and he has to restart the server. So that's a problem. We're working on that. But we have already, first of all, I believe that as of the last time I saw Dan's talk, I think SciBot has the most annotations of any annotator which I find very satisfying. Out of that million annotations, we own a significant chunk of those. It's great. The second piece of, so there's a lot of good news, right? But we've definitely found some really interesting things about how publishers do things on the back end of the webpages that they have that we don't love as much as SciBot running properly. So we can certainly, any publisher in the room, if I grab you and start screaming at your face, I'm sorry, it's just a deep frustration. But the kinds of things that would make our lives much easier would be to figure out which version of an article, because an article is just, it's a web artifact but there might be different copies of it like strewn around the web. And if I have several curators actually working on something, two of them might run into the same document. And we don't know about each other until hypothesis starts coming back and going, hey, you've already annotated this. Oh, well that's no good. So there are like little technical things that I think we really need to start thinking about addressing now. Because articles are supposed to be these preserved artifacts that we as a society have really said these things are valuable, we want to save them forever, but we also need to have like the copy of record. And that's something that I think is a general problem because as soon as you don't know that information, annotations break. So actually one thing I'll jump in here and say is that, and this is a thread I've heard from, I think probably all of you, but in particular, so Mike, I remember you said at one point that when you reformatted the textbook, it was two days of reformatting and copying and pasting. Your process was copying and pasting. Marianne's mantra is let's take the stupid out of science. And a lot of that stupid is people sitting at their desks copying and pasting. So much as Nate dislikes the word transclusion, what that really says is if we can, in the same way that we give addresses to web documents, we can give addresses to statements in web documents that are reliable and persistent, then we don't have to copy and paste things. We can refer to things and we can transglute things and it's just there's less breakage, there's less stupid. So another thing that really attracted me to this is we've heard a lot about semantics and in the neuroscience information framework which you mentioned, since the early days, we've been trying on the semantics of neuroscience. Neuroscience is an extremely hard domain to characterize because if you saw the announcement in 2014 about the US Brain Initiative, it's saying we still don't know how many cell types there are in the brain or even if the idea of a cell type has any validity and if so what? Because we're on the forefront. And so at various times people have come through and said we need to develop these formal knowledge models and I'm like we need to develop enough of a shared semantic that's computable so that we can reasonably make assertions that someone says I think that this is what this means but it's really in the usage of these things and the data that you determine whether these assertions have any validity or not. Is this a good concept, is it not? And the idea that you can sort of formally structure knowledge or a paper from the get go I think is just false. It's over time you refine your model, you come back, you loop back, you go forward. Different statements in the paper can be interpreted in different ways. So what I liked about Hypothesis was that it was a lightweight linking layer over these things and not built into it because the weight of the things you have to say about something even very simple to fully characterize it is enormous and this lets us use this hybrid between what computers can do and what humans can do. We supply some structured knowledge. Yes, sometimes it's very easy for us to look at a string and say gene. Other times it's very difficult for us to look at this and say anything because we don't know what the nature of these things are. And I had a very interesting experience, our group built a semantic wiki for neuroscience concepts a long time ago called Neurolex and Neurolex was a very useful thing. It was the semantic media wiki but it let you link pages of course and concepts of relationships rather than just a hyperlink. So you could say this neuron used this type of neurotransmitter, this neuron was here. It was very powerful. It was a really powerful tool for teaching people about why when you have a URI and you have a concept that's machine readable you can do a lot of things with it. But at one point we created a template to ask neuroscientists to fill out information about these neuron cell types. We had 40 worldwide experts come in and do this. First, these are some of the smartest people you're ever gonna know. They could not use a wiki, right? So it was a real barrier for them to use a wiki because that's not their expertise. They could put very fine electrodes into neurons and they could record things. They could do things in a lab that were artistic but the wiki was not worth their time and it flummoxed them. But when I was analyzing at one point, right? Because basically the knowledge that came out was a triple. There was the subject, there was the predicate and the verb. I noticed that some things were really easy for scientists to think about as a triple. So neuron in brain region X, very simple. They knew that knowledge. But as you got deeper and deeper into the characteristics of neurons, they, everything turned into a red link, right? Basically there were no completed triples and when you looked, you saw that the scientists reverted to natural language to try to capture the nuances of what they were trying to say. So there were some pieces where they could very easily turn this into a formal knowledge structure. There were others where you could see, it was, they were incapable in that construct to be able to say what it is they wanted to say. So they reverted to the power of language, right? That they could then say but sometimes or maybe. And you could actually watch this evolution and it was a very fascinating thing. So when I saw hypothesis and I saw the ability to be able to link across the documents without this rigid formality but allow the expression, right, the caveats, the notes, the things that were bothering people or thought about it. You've now narrowed down the fact that I don't have to digest an entire paper and an entire paper and draw links. Somebody's already done that for me. But they've been able to express in a very natural and easy way what they thought this relationship was. And if another person came in, they'd likely give you something else and somebody else might give you something else and over time that may change. But it showed you that there really is no sort of universal curation. This idea to make these very rigid structured things works okay but you need this flexibility. You need the interface between someone who has that deep domain knowledge and the things that they're referencing. Then the computers now instead of operating again on everything, you've narrowed down a subset and probably can use NLP or other things to make it so that this re-churcher never has to write a triple, right? Because if you ask some of my colleagues, I remember I was gonna write a paper once. Do I have to write in triples? Is that what I'm being asked to do? And you can't do it. So I find the opportunity to create these dynamic, evolving linking layers that allow this partnership between humans and machines to develop to the next level without overly imposing on the people who understand these things the necessity to learn how to write mark down or mark up or any other sort of language, right? That's an imposition and it takes me out of the things that I know into trying to tear my hair out to sort of figure out how to use this tool. That is why I think this lightweight connecting layer on top of is what I like to say. I like the increased thread count because that's what it does. And maybe we're going to, you know, to Mike's world be leading students to a place where, I mean, citation in the way you think about it, right? You three in the world of science. It is a highly specialized and in a way elite activity, right? What you're trying to do is get students to cite things in a more, I don't know, informal way, right? But to inculcate that habit of thinking in that practice. Yeah, so, you know, from a person that teaches students is wiki or anything like that, it's hard and the reason why it's hard for students is things like wiki try to train students to use really structured data. And someone once asked Ted Nelson, you know, what the downfall of the web was and his answer was simple, fonts, right? It was fonts that killed the web, you know, very Ted Nelson-esque, but it's true. I mean, what's happened is that we've kind of devolved from the web as a really semantic phenomenon to a WYSIWYG phenomenon and that's the world people live in. So how do we, how do we get, you know, how do we overlay semantic information in a way that WYSIWYG doesn't wipe out? And that's a big question, right? Because if we could do that, then maybe we could work happily in some of these WYSIWYG interfaces without just completely, you know, zapping all our data every time that we reformat it. So the thing that excites me in that way about, you know, hypothesis and the URLs is here's something, instead of taking just an HTTP link, right, and then, you know, pulling a quote and adding that quote in or whatever, you know, you're able to take this hypothesis link which will link to a specific anchor point in that document and may also be able to pull other information that's been added to that document about date or subject or, you know, specific IDs or things like that. And that can carry that around and as long as that link, right, doesn't get, doesn't get nixed as that document moves from place to place, that information doesn't disappear because someone decided to, you know, select all and like, you know, change it to Calibri, you know, which is the place that we are now, right? So I think that's the exciting thing is teaching students to use this, reclaim these URLs as something that has, that expresses meaning, right? And lay them, you know, lay them into a document and then the document generates the footnotes, right? Yeah, no, that's a, thank you for bringing that point and beating it to death. But no, if the annotations break because of, you know, changing to Calibri then we're kind of nowhere. Although I can't say we're nowhere because we still have on the back end all of those annotations stay on the back end, which is great, you just can't see them anymore. And that's the thing that we really have to fight against. Especially, again, in the scientific literature where you're really trying to annotate an artifact that's gonna be there for 100 years. And so if you can do that, that would be a really big step in making sure that you actually preserve the annotation layer on top of this, you know, 100 year artifact, right? So the annotations can't break after a year if your artifact is meant to last 100 years. So we need to figure that out. It's not, it's something that we can figure out, right? It's not that difficult, but it is something that we're gonna have to address as a community. I think that that was something also as soon as we started, as soon as I started doing this, this longevity, right? That things that my colleagues immediately asked for, if they were going to invest time in this, this was not ephemera, right? This was not something that you started a little conversation and it was a tweet that sort of went on, right? These are now data and these are things that you need to preserve. So immediately, one of the first things we asked was we need to be able to export our annotations. We saw Pundit had that, right? And we wanna export them and we wanna reform them, right? Because these are now serving as data for something else or something else that we're going to be able to do with it. So we want to be able to manipulate it, but we also want to make sure that my explorations over the last two years with Hypothesis represents my corpus of scientific work. I used to keep these in reference managers, I used to keep them someplace else. I need to be sure that I can preserve that in a form where I can go back to it. Because it often takes two years to complete a study and two years to write a paper. So it does have to have some permanence, right? And then you also, again, we started to see with the push towards activity pages, this constant stream a la Twitter was not a useful way for us to gather these, right? That there were multiple annotations that were attached to a URL, even in science, as Anita says, the URL is not sufficient because that article exists in five different places at five different URLs. It's the DOI or some other object, right? That says I'm actually attaching these annotations to some third category, not the individual instances itself that needs to sink across. So I think that it starts to sort of point out that I always like to say for myself, if a tool is useful, it's not just because I work for Hypothesis, I will use it all the time. It becomes an essential part of what I do. But I don't want to invest in it if I think that this is going to go away, right? I want to make sure that I would have access to it and that I can use it, which is why I think the open standard was so important. But we did touch a little bit on the kind of referencing and I know I got one of my favorite lines from John is, some people annotate not just because they like to annotate, but they're actually trying to do something else, right? That they're annotating during the course of something else. And this is something that I think is a really important thing in science. And we talked about the fact that the footnotes were great, but the APA style doesn't call for a footnote, right? It calls for a formal reference, a formal citation. And right now, that still involves a whole lot of copying and pasting and all kinds of torturous paths through different tools. But we ought to be cognizant of the fact that if we want annotation to become a regular part of this that you need to understand the end goals, right? The claim chart is what he needs. He's gonna use annotation to get his claim chart. That's what he needs, right? I think that we cannot overestimate. So I mean, I think maybe you told me an estimate at some point of what percentage of the NIH budget is spent on reformatting and copying and pasting, right? And it's non-trivial. So, and that's true in every domain, right? So again, I mean, just that, just it's such a simple basic property of annotation, but just the fact that you can refer to and move around resources that are canonically addressable, right? At the level of a word, a sentence, a paragraph, right? In the same way that you talk about documents as a whole is just an insane efficiency boost across almost every line of business that I can think of, right? I mean, it's just a baseline, better way to do things. That's what I just was thinking while you were talking. So basically my signs in the classroom is the same article that is available on a different webpage and also available in PDF form across who knows how many users being able to annotate it in our space and then somehow have that also be able to be plastered over the science version would make it doubly useful. But I don't know how that would even be possible. Well, it is possible, especially in an automated way. It often works except when it doesn't. Yeah, exactly. It absolutely is possible. So we were thinking we would open it up to conversation and we're actually gonna try a slightly new thing with the questions. So we're gonna, at Esther Dyson's suggestion, my lovely assistant, Christoph, is going to act as a mic stand. And we are gonna invite people who wanna ask questions to come up to our living human mic stand and ask the questions there. It's a Christoph stand. It's not a mic stand, it's a Christoph stand. That's right. Christoph actually. We're gonna call Mike. Hello? Hello. There we go. Without organizing annotations, not just storing them in a single centralized or a hypothesis database, but having multiple databases, you either exchange annotations or somehow be able to search across repositories of annotations. Is that work that still needs to be done or how do you imagine it going forward? Oh, that's absolutely central to the vision for hypothesis, right? Is that it is an implementation of a set of standards. And that there, this will only succeed if there is an ecosystem of interoperable annotation services and clients and a standard way to exchange annotations among them. And I mean, we heard this morning about exactly that, right? This is what the W3C have done actually is to say, this is how we will agree to talk about what an annotation looks like. This is how we will agree to talk about exchanging them among systems. Absolutely. That's fundamental. One more, I also noticed that there's no copyright other than releasing to the public domain when I annotate using the Chrome extension for hypothesis. Is that something that also, I imagine some people might want to have a slightly more restrictive. In hypothesis, and that's just our policy, right? But in hypothesis, the policy is that a public annotation is licensed to CC0. Got it. Hi, I'm speaking. Another voice from LibraryLand. I know we've been in Twitter conversation about handling annotations as some kind of an event, a scholarly event. But I have to say, first of all, that from the library world, we're just suffering from the fatigue, the extreme need to treat all these new forms of scholarship as first class objects. And while in principle, we're your first, you know, acroite. We're so deeply supportive of the idea, but we're also underwater financially. And so we're already struggling to treat software, data sets, and all those other objects that are part of a new 21st century scholarly ecosystem. There's no question that annotations deserve that treatment. And we've talked about, you know, if you issued a DOI and they get registered as a cross-ref event, then they're very easily trackable through any system. And that probably really, the infrastructure isn't as hard in that model. But again, it's who the heck pays for this. You know, we keep looking to the commons to pay for this. And you know, I work with some other folks here on research data alliance efforts, force 11 efforts to try and find a sustainable path to handling, curating, and really treating all this other stuff besides the paper as first class objects. It's not that we don't all share that value and really want to make it happen, but it's a completely unfunded mandate. And I just don't know. And libraries just are not the backyard people can keep pointing to as magically being able to do it. I mean, we seriously need help. Another reason to get rid of stupid as many places as we can, because the more time and effort we spend on things that are unproductive, the less we have to actually devote to things that now do need our attention. And yet we are carrying this huge burden of inefficiency, which, you know, I think that characterizes most domains, but it's particularly, you know, kind of galling, I think, given, as I always like to say, anytime a computer scientist or programmer will watch me do something, you can just see the pained look on their face at what I'm actually doing. And they're like, well, why don't you do this? And I'm like, I don't. I have to work through the user interface. And that means awful, repetitive, cutting, pasting, hair pulling, jumping back and forth between incompatible document formats, all kinds of stuff that you have to do. And it's technically not necessary, right? Practically you're never gonna get rid of all of it. But I think because of these unfunded mandates, we can't be spending time on this anymore, right? We have to be able to turn our attention to something else. And I think the time is really to do that is now. I also, just as an aside, you know, I don't think every annotation needs to be preserved. What I like about the direct link about the, you know, is when do I decide that this thing does need to be preserved? When does it not? You know, when do I invest extra, and when do I not? We'll never get that 100% correct, but that's what libraries do, right? They decide what are the culturally important things, or what are the things that we do need to invest in. The other stuff, maybe it'll get carried forward because of many copies, maybe it won't. But you know, society won't collapse if we don't carry everything forward. The brain doesn't carry everything forward. You know, we manage. So Marianne, did you develop the getting rid of stupid in science when you were working on that APA paper? Yeah, okay. Yeah, that's what I thought. Anita, was there anything you wanna say about the possibility of business models around? Yeah, oh no, no, no. I'm not sure about that, but I mean, there are certainly a lot of business models where there's cost savings that can be realized. But no, the reformatting of references is one of those things where all researchers just need to stop because the really funny part of that is the typesetters pull out everything anyway and redo it. And the fact that we're all not just putting in PMIDs into the reference list I think is just idiotic. Thank you. So a few super quick interactive questions. So you guys are big annotation people and obviously some of you at least read a lot of papers. Do you print out the papers and take annotations on the papers or do you do it on the computer? So I would say one of the most profound changes over the last two years with hypothesis is my reading habits have completely changed from printing them out and scribbling in the margins or to downloading PDFs or even uploading them into reference managers to using my reading on the web and tags and not even necessarily explicit tags as the way that I interact with the literature because I find web-based annotations so useful beyond anything else that I use. So it actually did flip me over because I used to hate reading online but now I would rather read online because I get this advantage. Can we get a hypothesis chant? Hypothesis, no, it's just me, sorry. Do you guys still print stuff out? Do you use it on the computer? What tools? Me? Yeah. I feel like I don't read anymore, sad but it's true. I usually, I'm using it to actually write the annotations on the paper that I'm reading so I do almost everything web-based. I rarely print anything out and write in the margins. I can't read my own handwriting anyway, to be totally honest. No, I've stopped killing trees. I think hypothesis for that. And, go, what? So do I print things out and read? Is that the, or? Did you print things out and annotate is the important question? Or how do you take notes? Like, I print things out and then I type my notes in a text document. Yeah, so I'm kind of, right now I'm kind of a one note fanatic and I like having clippings and copies of things because things disappear and that's of course the big issue, right? So one note allows me to take a snip of that page and keep it but yeah, I don't write things out. I generally use that. I think hypothesis is a piece of that. It's just the only reason why I end up defaulting to one note is I've just been burned too many times by things disappearing. I should say interestingly, just to your point, the only things I print out and scribble on by hand are when I am writing something. So the only way that I can review a draft and quickly interact with it, flip back and forth, I can't do that electronically. I don't see it in the same way. So the one thing I do print out is anything that I am writing and that is where I do my editing. Cool, okay then the second quick question is how many people, maybe in the whole room, including you guys, keep lists of quotes or ideas? And I'd like to ask the speakers, what tools do you use it? Do you do it on real notebook? Do you do it on a computer? What software, if so? That's my hypothesis quote tag. Okay. I feel like a loser because I don't do any of the things you're asking. It's okay. Okay, we still love you. So I tag things and I've been doing that for a very long time. I was one of the original biggest fans of I think called Delicious, probably first social book marketing service. And I think I still failed to fully explain to most people how important a change that was. But the example that I often give is like this. And you'll see this all the time now. Someone will say, can you give me a list of things? You know, whatever it would be, movies or whatever it is. I want a list of things. So someone will open up a document and make a list of things and send you the document. So they've sent you a list. What I prefer to do is put that list onto a tag and send you the link to the tag, which means I have sent you not the answer to a question, but a way that you can answer the question every time you visit a URL and get a possibly different and better answer, right? And that cognitive shift is still, after all this time, I think not really appreciated by an awful lot of people. Awesome. And just the last thing is, so I take it you use OneNote also for that? For what? Yeah, I use OneNote or Wiki. I'm kind of a wiki fanatic. The big revelation for me in terms of keeping notes was to, instead of breaking notes out by document, to actually decompose documents into ideas in the document and have multiple separate notes from a document that then could be combined and slowly become, be expanded into something that's capturing an idea among many documents. So one of the problems with a lot of tools that we see out there is, and hypothesis is an exception to this, is they treat the document as something you wanna respond to the document rather than try to extract an idea that is bigger than the document and start combining that with other ideas, right? To me that's been one of the big revelations about really reaction to documents is the way that we react to whole documents is kind of really constraining the possibilities of the synergistic possibilities of the web. So you do use OneNote then to keep lists of ideas and quotes and stuff? Okay, awesome. So just asking as a citizen this time, you guys are working in fields of science and education and we've heard from journalists and their scholarship and publishing. I'm curious, this idea of the annotation layer operating as a kind of highway over which information can travel that's sort of above the document layer and sort of has an existence outside the document layer so documents can come and go and the highway can remain and new things can travel on it. What other fields do you imagine being radically or could be radically transformed by this kind of transport layer in practice beyond education and beyond science? Have you thought about that kind of question or maybe we all should and just think about what other kinds of human practices might be able to be effective? What else is there besides science? Sorry, I know. And education. So I have an answer to that that's a little techno utopian for 2017. But initially when the web came out a lot of people conceptualize this as a browser goes and it reads pages that are designed for a web browser and the web browser displays the pages. That was kind of the model that people had but very quickly people kind of built around the edges of that and started understanding, well we're delivering HTML to the browser across this set of protocols and we can do a variety of things that this is not designed for. And so in 1994 you suddenly can order a Pizza Hut pizza. I mean this is true, 1994, check it out. Through the web, right? And how does that happen? Well there's a form that goes to a thing that produces a fax that goes to the Pizza Hut. Fax, I know, web and fax. It was there for like one year I think there was an overlap. So I think that's the interesting thing about this is you can think about things like hypothesis or genius as a product and I think that that product just like web browsers had to develop that product has to get much more polished than it is and much more user friendly and much more intuitive. But co-developing with that you have to see this set of protocols, this addressing space and I think the real future if we grab it is by those people that, you know, the same people that were kind of the pearl hackers in 1994, 95 that said, you know, wow, you can build a store with this, you know? It doesn't just have to be a document. I think there's a set of applications out there that we can't imagine but that we have to really create the developer community that can patch together quick and kind of messy solutions to things so that we can explore that space and eventually get to some of the more polished stuff. So the answer is I think it could be like John says. I mean, the answer to can annotation do that is who knows but let's try and that's how we get there. I actually give a different answer to that which is there's a field that already has been transformed by this that we all take for granted and we probably shouldn't and that's software development. So I mean, when it comes to living in the information world programmers have always been living in the future relative to the rest of the population because it was always the case that the work process and the work product were digital and networked almost before anything else was. So it is completely taken for granted on GitHub in that work environment that I with my colleagues can point to a line of code in a revision of a document and we can have conversation about that particular point in time and that particular location in the code and the precision and accuracy of that method of collaboration but let's just say without that the software that we know now would not be possible and I think that the people who live and work in that environment have so internalized that that they take it for granted and also don't fully recognize the degree to which people in most other lines of business are not supported by that level of tooling and sophistication and that we need to kind of, we will be bringing everyone up to the same level. Hi, I'm Joshua Choi, medical student at St. Louis University. So I just wanted to make a comment from my own experience, starting from the mention of Nirolex actually, I found it quite useful during my research on basal ganglia, so thank you for that. But I also was quite inspired by your mentioning of the trouble you had getting normal people like world famous neurologists to input structured data. Yes, in my experience there's something very similar when it comes to physicians having to input patient data. We have a lot of problems when it comes to that and we're inputting a ton of data from friends, family, the patients themselves, labs and it's super underutilized because it's all scattered together and we're not able to really comment on them or point to specific places in order to express and document our clinical judgment and reasoning. So I'm hoping, like there are a lot of EHR or electronic health record systems today that try to get us to input structured data, like here's every single symptom that they mentioned positive and negative. Here's every single examination finding that we have. But all that misses the point if I still can't point to a specific place on a graph of their glucose during some admission and point out this is why such and such or this is why I think they have this diagnosis or not. So I'm hoping that one day annotation technology can help save lives frankly by helping physicians make less mistakes and be more efficient and document and express and communicate their clinical judgment together by using annotations and annotating raw data so to speak and to annotate each other on that, all auditable and such. There's something, it might fit in the strange liminal boundary between structured data that humans usually can't do at least normal people like the neurologists and people wanting to input natural language. Oh, I could go on and on about physician notes and the kind of copying and pasting they do. I'm hoping some day annotations can help with that. So I'll tell you my one story which it still I think is relevant to that. So electronic health records and human language I think is a big thing. The many things attracted me to annotation but one of them was that it is the capacity to create an interoperable knowledge layer where as things pass through different stages which they almost always do by the time you go from one place to the next there's valuable knowledge, valuable comments and things that get put on there but they're lost, right? Because every time you switch over from one system to the next whatever comments or anything or there often go away or even when they're in the same system that's exactly right. So if you could look very closely at my hand you would see a long scar going down my fingers and that was a very unfortunate interaction with an automatic nail clipper in my cat. Okay, she raked my hand and ripped it to shreds. I went to the emergency room because by the evening I couldn't actually use my hand and cat scratches are dangerous. And the woman's, first of all the electronic health record she has to turn around with her back to me and sit there and try to type in and she's looking for animal scratch, animal scratch, cat scratch, can't find it. She's like, eh, animal bite. So she put it in as an animal bite so I'm incorrectly coded as an animal bite because I have a cat scratch. And I said that's how people get frustrated. That would have taken her five seconds to say she was scratched by her cat, okay. And that's why I mean there is this interaction between sort of deep structure and then the flexibility that language actually has. But to me again when you actually look you can see that some things lend itself very well. So my name, my social security number, I can structure that and the more we deviate from that the worse life gets because there are some things where it's fairly easy to identify and you wanna structure it. But then the further out you get, the harder it is for people to express this type of knowledge the more time it takes, the more frustration. This is something else that I think John has really appreciated and certainly developers in my laboratory that when people are using these tools in real life contexts, okay, as opposed to some sort of idealized development, they are using them under pressure, they are using them because they need to get from A to B and the more barriers that are put between that the more likely they are to code you as a right animal bite even though this was not an animal bite. I actually had a friend who was a bioinformatician and she said they pretty much assume that there's a 20% error rate in most of the data that they get. Part of that is there's 40 different ways to measure blood pressure. They know that nobody actually records how they measure blood pressure and they're gonna have to put something down because this is a required field so they'll put something down but they ignore it, right, it's a lie. And I think that this is really, this is underappreciated that when you use these tools under pressure, under duress, under deadline, every extra click becomes a frustration, right, every time you need to copy and paste becomes a frustration so what you end up doing, right, is whatever it takes for you to get your job done and that's something else I think we always have to keep in mind when we design these, when we design them for this very narrow purpose, we think that they can be used in these contexts but in the field they fail utterly. Yes, it is. In that they're both machine readable, in that they can analyze the relationships between comments and between what different people say and what they're talking about but it's also lends itself very well to natural language in that the content itself of the annotations might be written in natural language in a cat scratch, right, or even if they were forced to code it as animal bite maybe it would be nice if everyone else who saw it from the non saw a comment on it that said, actually it's a cat scratch. Hi, we talk about interactions with ourselves and our work and a software machine and with hypothesis and other collaboration machines and we collaborate with other people through these machines. So human, machine, human. I'm wondering, have these annotation projects change the human-human interactions? Again, you're using human, machine, human types of tools and how has that changed the human-human connections? Douglas Engelbar would talk about there are two sides to any augmentation setting. One would be the human system, the other would be the technical system and so I'm trying to ask a human system question. How has the human system been changing? Thank you. I can give you one example from really, probably the dominant example right now. A lot of the use of hypothesis is happening in the classroom right now and the interaction between teachers and students and among students all focused again on particular passages and texts, right, has been a compelling thing in particular higher ed, right? And so, and Jeremy Dean here will tell you for hours about the kinds of models that people are developing and there are lots of different ones, right? But for example, a teacher can pre-annotate a document, a class reading with a set of things for students to respond to, right? And then one model might be, I want the students to respond to me, one model might be I want to actually have the students converse with one another. All this stuff is happening and again, it's happening in a way where it's not just, it's not just tell me what you think about this book or this chapter, right? It's tell me what you think about this paragraph, but it's happening in that context, directly in that context, literally in the document, right? And so that's my best example. Whenever we had people co-annotate, which actually ends up making better annotations overall, they used to have to pass a PDF back and forth and then comment on each other's comments. And I just recently co-annotated a paper with a group of authors and it was seamless because we could just kind of talk to each other in the comments within hypothesis and if we were all working on it at the same time, we could have a kind of collaborative effort and we're not even in the same state. So I can only imagine how we have a lot of students that are also team annotating papers in class. So that lets them interact both human-human and with the machine in the middle. I don't know if it's a different answer for you guys because you have a little bit of an automated system, but. So I mean, I would say human-human interactions, it actually has changed the way that I now communicate with, for example, my lab members when I want to communicate something to them. I often do it with an annotation now, right? So it's basically this is what I'm talking about, this is where I'm talking about it. And I'm sending you a notification and I'm telling you I would like you to act on this thing. So I'm using it as a messenger system, but without me having to copy and paste the URLs and the thing I'm talking about, I basically launch it and say, can you take care of this? It was also interesting because I was sitting on a review panel where it was fairly intensely neuroscience centered and I was asked to essentially fact-check a set of computational models for the things in the papers that they were claiming that they were getting their parameters. And I actually got annoyed that I didn't have direct links because I had to spend anordinate amount of time pouring through these papers to try to figure out exactly what they were talking about. And I said, we really ought to allow this type of activity if you don't give me something more than just I reference this paper. So I find both as a messaging system, it's just a lot easier rather than even composing an email. It's like, here's a subject line, add this to whatever, that it's changed my modes of communication. Good, thanks everyone and thank you guys for playing along with my crazy experiments. Thank you.