 So I'm Peter Browland, a hypothesis, and many other places before then. And I just wanted to start with this really interesting thing that just came out via a friend that paid content. And this has very little to do directly with my talk except for the fact that this is a put-together Bible or Gospels. And evidently, the second image has annotations there by a king. And the comment was, sorry, going back up. The annotation on the second image below records King Charles' disagreement with the way that the little-getting group had arranged the text of the Sermon of the Mount. And in the third image, Charles points out another error and then crosses out his annotation writing, I confess that I was too hasty for it is very well, but too little emissions that I have marked. So here's a very interesting recorded annotation and then edit and deletion of an annotation in a physical artifact. So I think that that's really interesting. Okay, so what I want to talk about is the work that hypothesis doing and particularly the work that we're doing in the context of a broader community effort at the W3C in the open annotation collaboration, which is co-headed by someone in the audience, Rob Sanderson. Where is Rob now? There he is, yay. My poor recording can't figure out what I'm pointing to, but that's all right. So I'm going to take the advice of Ed Summers and do something really kind of crazy, which is start with a demo. So this may or may not work because this code is really new. Basically got it yesterday afternoon, so we'll see if it works. So what I want to do is I'm going to try to make an annotation in this PDF. Which is from the new academic journal, E-Life, and then I'm going to toggle back to an HTML version of the article and hopefully we'll see the annotation there. Now a couple quick things, I'm going to use slightly different technologies for the PDF and the HTML version and two different browsers. And I just want to explain that real quickly. They'll merge, first of all. But secondly, Firefox, starting with version 19, is now shipping with something called PDFJS installed. So PDFJS is a JavaScript library that is rendering PDF natively in the browser. And this enables web control over the PDF in a really nice web-friendly way. And so this is, PDFJS has been in the works for a while, but this is now sort of working code in the browser. This isn't in Chrome or other browsers, but similar functionality is expected. There are ways of importing PDFJS as a handler into other browsers. And so that's one of the things we're looking at. But PDFJS in Firefox makes PDF a very interesting experience in interesting ways. So in the HTML version, I'm going to use Chrome. And in Chrome, also a very new code on our part, we have an actual extension loaded. So here in Firefox, I have a bookmarklet. And in Chrome for HTML, I have a very brand new newly minted extension. Not in the Chrome Store. So don't try this at home. You won't find it. So this is really all quite new, quite raw. And so briefly, what hypothesis is, is we are a small company of three full-time individuals and a cast of many others who are working to build on the W3C open annotation specification to construct software stack from the storage layer up through the UI that will enable open annotation of web documents. And by a document, I mean in the broad sense of any media representation. So we hope, we're obviously demos work best with text, but we definitely want to be enabling the annotation of images, of video, of data, genetics data, protein data, numerical data. So as long as it's a referenceable object, we're trying to create software that enables the creation of annotations on those documents that are themselves referenceable through URLs, ultimately, and that enable cross-document contextualization. So there's a lot more beyond that that has to be done in order to create a workflow around annotation, some of which I'll touch on briefly. Another major component of this, particularly in a scholarly context, but also in the broader web, is reputation and identity. And so coming up with a reputation metric is also something we're very much concerned with. And again, I'll get to that in a little bit later. But let me demonstrate here. So I am presently just signed in on a test account. And here you can see on the right hand margin, indications of annotations. The coloring suggests that there is a heat map, essentially heat map, indicating that there are multiple annotations in the document. And if you point to any one pointer, then the side panel will expand, and you'll be able to see the details of the particular annotation. These are scholarly annotations, but just a caveat that they are ones that we've constructed. So this is not a real conversation among scholars. This is a pseudo conversation. But just to give you a flavor of what this might look like. And then if these annotations themselves had further comments, then you could expand these by clicking on them and so forth. So we're building various kinds of navigation here. And I think more apparent will be in the HTML version will be some of the editing controls that we're building in for editing and deleting a user's own annotations and so forth. And I will caveat this is all very sensitive still to loading and priority and we're really working on a lot of that. But I will make a draft annotation here. And here you see I'm making this annotation public. And spelling complex words under stress never a fun point, right? And just, you know, we do support markdown syntax so folks know what that is. It's just a simple way of embedding HTML layout instructions. So I'll save that there. And then collapse that. So now I have an annotation there, which you can see there. Okay. So now I'm going to do the risky thing here. Okay. And let me come over to the Chrome version and I'm going to reload this because it's been a while. And this also forces a sync. The extension is noted here. Let me see if I'm still signed in and it will take a few minutes to load. There's a little bit more functionality in the bookmarklet yet compared to the extension in terms of some of the navigation controls. So for example, you know, this down button we want to navigate to the next available annotation if there's not one on a page and so forth. But let's see if I can find, may have to help me find where it was that I should have kept better note. Where the heck was my own annotation? Do you remember? Aha. Okay. Voila. Okay. So here I am. So a comment that I made. In the PDF version is now showing up in Eli's HTML version of the article. Now this is obviously a painful process for developers but highly worthwhile because so much of scholarly literature is still represented in PDF instantiations. And it's also the kind of cross-format leakage that we want to embody beyond strictly HTML and PDF. But trying to figure out over time how to establish linkages between content that has strong association. Right. So if you're making an annotation in one scholarly representation that has a close surrogate and another format or another instantiation, we want to be able, if appropriate, to overlay annotations when appropriate. So that is all obviously potentially hairy problem. But this is a password. Now this is aided in this hand by very clean naming by Eli of the HTML and PDF surrogates of the article. And so obviously to the extent that that is smooth and regularized, we are greatly aided in our work to do this. Most of the, certainly the newer journals like PLOS, PureJ, Eli are very good at this. Older journals or larger publishers that have more complicated workflows, this gets to be more of a problem. And so one of the calls in the greater publishing world is to establish, you know, a higher naming affinity between representations of their content. And that is not always the case, unfortunately. So, okay, demo done. Yay. So now I just want to go through quickly, you know, what are we about and what are our aims and to draw in a little bit of conversation about, you know, how annotations might work and what the challenges are. And I hope to be able to fly through this fast enough so that we have ample time for some discussion and questions. Okay, so hypothesis, as I mentioned earlier, is a very small not-for-profit with the audacious goal of trying to enable the annotation of all knowledge on the web. Why start something new unless you can be audacious and so that's why we started here. This is my call out to Rob, which he will appreciate. This is actually close by our offices. We are in San Francisco on one of the piers actually underneath the Bay Bridge. And so this lovely corsair was floating underneath the bridge at the time. So, you know, there are a lot of historical sort of antecedents to digital annotation. And, you know, there's perhaps, to some extent, an obvious, well, why is it worthwhile trying to create a system that embeds inline representation of engagement as opposed to sort of the existing footnote kind of commenting that we have now through systems like discuss and or discuss and others like that that you're familiar with in blogging software. And, you know, I think it just helps to point out how things could be if they had been different, right? And so, you know, this is just a copy from NARA of the Magna Carta. And it's just, you know, an interesting thought experiment to think about, you know, if this had been born in an age where annotations could have been recorded to think about, you know, what kinds of discussions would have led to the final elaboration of each of these principles. And there were many of them. And what would have been the contextual, you know, disputes that finally led down to the settling of that document as it is and which was greatly revised over time. And, you know, and yet that struggle was lost. We can only recapitulate it through our examination of the revisions over time as they occurred and through, you know, very secondary materials relating to the conflict between the feudal entities involved. And so, obviously, if we'd been able to capture that discussion, we would have been able to understand quite a bit more about the intellectual context in which the document had arisen. And yet we've struggled with this. This is sort of the, or example of print-based annotation, this page from the Talmud, and you can see, you know, that this was the representation of, this is the original context, the content in the center with the annotation arrayed around it, right? So this is, you know, how, you know, one very formalized aspect of annotation was recorded in print and clearly we can do so much better in a digital environment. And, you know, I hate to call out this, you know, one individual, Benavar Bush, but really it was, you know, doing World War II when Bush was very active in the Allied war effort and very actually instrumental in the Manhattan Project, who did write about the sensing of a networked age. And it's worth noting that sort of the network mentality is really different than a non-networked mentality. To be able to accept a distributed form of communication around the planet or to presume that that might be the case leads one to different kinds of insight and discovery than might be the case when you cannot envision that. And so it is in that context, in that frame, that Bush was able to write that text and describing how a researcher might navigate different findings and say occasionally he inserts a comment of his own either linking it to the main trail or joining it to a side trail to a particular item and assuming that that itself would be referenceable and followable by others in a way that an author of the Talmud could never have envisioned in this manner. So in this age, we're seeing a lot of different kinds of annotation efforts emerge on the web and they're trying in many different niches to come up with many different kinds of functionality and it's really an interesting process to watch. I think one really interesting bellwether was the funding of this organization, RAP Genius, by Andreessen Horowitz. And RAP Genius intends to do what it suggests, which is to provide annotation of RAP lyrics by communities that are informed of the meaning of those lyrics and to provide context that's available in Wikipedia or other sources of, for example, urban lore or other culturally aware materials. But for us, that hypothesis in the open annotation community, what was notable was Andreessen's own annotation of the funding notice in which Mark said in 1993 when Eric Bennett and I were coding mosaic, we had actually built in to one of the early mosaic builds support for what they called group annotations realizing that that would be a necessary component of a worldwide web. But because a distributed storage layer at that time didn't exist that could provide accessibility to those annotations or to their storage, they ended up pulling it out. And so he says at the bottom of this note, I often wonder how the internet would have turned out differently if users had been able to annotate everything, to add new layers of knowledge to all knowledge on and on. So 20 years in a sense, we've not had that capability even though we've seen the need for it. And there have been a lot of reasons why we haven't been able to build it. It's not trivial and it's not just the lack of a distributed storage layer. It's this litany of things that I have on the screen which ranges from prior to the W3C efforts, no predominant standards, the problems that are discussed often in this environment at CNI, the lack of document or entity identification, identity management, the problems of integrating this with publishing workflow, and then really the biggie, what's called the cold start problem, of generating enough material within a particular domain so that it becomes attractive to continue to engage in that domain. This is a solvable problem. One look at Twitter will inform you that this is done successfully in some context, somewhat surprisingly indeed to software creators at times. So we take great heart from that and so that's where hypothesis is coming in. So we are trying to build a reference implementation of software that will help institutions, help organizations start working with annotation in domains to jump start, to overcome in part that cold start problem. We are funded through a variety of sources right now. We have actually initially started out with some Kickstarter funding but we have major funding from Sloan. We have also funding from Shuttleworth and the Knight Foundation and Andrew W. Mellon is generously funding a conference that we're putting on in April this month. The goals of open annotation are to build an annotation system which are standards based, where the annotations are interoperable, where they're addressable, where they work across representations and formats such as the PDF HTML demo that I provided at first and the ability to point into documents to reference specific sections of text, certainly enabling through reference ability, you know, across network contextualization, enabling threaded discussions and particularly resilient to changes in document structure or flow. The conference that Mellon is funding is called I Annotate. It's next week actually, now scarily soon and it's a small conference by invitation about 100 people and the idea there is to pull together as many use cases as possible to generate an understanding of how annotation might work, not only in scholarly literature but also in domains like open government, data handling, journalism and other areas where sustained engagement of a discussion might have high value and might be productive in terms of use cases that help us generate really the parameters of the software solution that we're trying to provide. So I just want to introduce quickly a few sort of basic terms. Those of you familiar with the annotation model of W3C will be familiar with this, but you know, just to grab a certain piece of text, this happens to be a discussion of optics and the annotation is notable because it was made by an individual named Sir Isaac Newton and so we provide basic naming of these elements. The annotation itself is referred to as a body of an annotation object. The target is the actual manuscript and then if there's a particular component that's being annotated, a particular section of material, then we can refer to a selector that demarcates what that annotation is actually referencing. So this is just a sort of top level terminology. This is a schematic of the W3C open annotation, actually now slightly dated, but it gives you an idea of the RDF schema that lays out the specification that Rob will be happy to volunteer a further explanation of should you so desire. Many of you have been, I'm sure, fortunate not to have to consider documents like this and may you continue to be so blessed. But the RDF schema is the work that's gone into laying out the specification has tried to encompass a wide variety of use cases and so by inclusion or removal of various elements you can actually convey a wide number of potential types of annotation. So a completely specified annotation is what we consider to be an annotation in the context of a document making a comment on a specific point of text and you could add in further descriptors or metadata to describe the intent of the annotation and so forth. If you leave out a selection, then you're essentially making a comment on a work. If you are simply, you know, without making an annotation body without making a explicit commentary, you're highlighting text by including the target reference and a highlight specification or a selection specification and you can also create, in essence, a bookmark pointing to an entity of interest and so you can see here how an annotation system specified in W3C could interrelate to a Zotero or Mindelay or citation system and how citations, traditional academic citations might be referenced within an annotation environment. And, you know, it's worth pointing just this basic thing that we take for granted. We can point to documents, right? We have, you know, what consumers understand as URLs that point to the top level of a document and that's a well-known system but targeting that selector, naming that selector is not something that's specified in the web context in any regularized way for any kind of media, right? Even like in video, you can do timestamps but that's not a very resilient system. There might be ways of approaching video through Mozilla's Popcorn which is another JavaScript implementation that deals with some of the characteristics of a video stream but this elemental problem is something that all of us in annotation have to solve in one way or another and it is true for any kind of media. You know, genetic material, protein material all have to have domain-specific ways of denoting selections in order to provide a targeted context for an annotation. And, you know, as I referenced at the beginning we do want this resilience also in terms of format linkages and format associations. So, you know, being able to associate a PDF or an EPUB with an HTML representation, for example, in a world of textually dominant documents is something that's a priority and something that has to come not only through algorithmic matching machine-based determination but also needs to come through, you know, refactoring of publishing workflows and publishing not just in terms of with a capital P but lowercase p to the extent that we're all publishers we all need to work towards systems that engineer in as much syntax glue as possible to aid machine, you know, conveyance of that similarity. And we also want in a context of an annotation to be resilient to changes. Here just a spelling example and so we have to figure out fuzzy matching that works to grab in this case in a textual context, you know, refix, post-fix kind of bookmark so that a text string cannot be identified regardless of its eventual location in a document and that could be very well the case even if, for example, a whole paragraph moves from one location to another. Many commercial annotation systems that work within a given domain like ReadMill which is a ebook passage selection system do as much of all they can of all the approaches that they can in order to ensure that the user gets to the representation of the passage that they've selected so that could be, you know, working with the DOM of the document that could be using XPath that could be using fuzzy matching depending upon the context and so forth. So just a few minutes on how this would work in a journal context. There are a couple of obvious use cases in traditional scholarly literature. One of those is in peer review and particularly in a peer review context reputation identity become really important and in any at scale annotation system we recognize that not only is identity a critical factor but reputation is an issue and an annotation system is going to be subject to to spam, to trolling and other maleficent activity and so some of that can be detected through machine algorithms by patterns of behavior or patterns of expression some of it may require certain kinds of moderation and certain kinds of moderation may be highly desirable in a publishing context given a particular workflow again in the journal context of moving an article through a submission process. So in a in a pre-publication peer review there are lots of questions that would have to be addressed that are addressed manually now but if you are implementing an annotation system you have to think about whole document comments versus commenting on particular elements of text the scoring of both the review and the reviewer whether or not you have an open peer review process or closed peer review process or a hybrid between those you have to consider whether or not you're assigning reviewers in your workflow or through growing use of reputation whether you are designating reviewers through selection or sponsoring self-selection of review and so forth so annotation is not in and of itself peer review in a box there is a whole set of existing workflow issues that would have to be considered post-pub peer review is more similar to open discourse on the net and is really I wouldn't say trivial but it is a much more straightforward case that is more susceptible to some of the spam or trolling cases but in another situation is more easily controlled we have more experience with this kind of system just through open commenting on blogs so that's it for my formal presentation this is the team of individuals that's been working on it either in some or part many people are beginning to attribute this this is just the software level Rob will tell you that the W3C community is extremely active and very highly engaged and that is the specification component of this work and that's it for me so I welcome your questions thank you