 For a variety of reasons, I've been terribly sleep short lately, I've been traveling a lot and I'm in a major deficit. And so when Dan asked me if I'd drive in from Marin at 8.30 this morning, it really tested my commitment to both Dan and this organization. I'd rather be dreaming. But the reason I'm here, instead, is that I have a dream and you folks are part of it, intimately I think. And my dream is pretty simple. It's the same dream I've had since I discovered the internet back in about 1985. And that dream is that we will create a time and maybe sooner than many people would think in which anybody anywhere can know as much as they are intellectually capable of assimilating about any subject as is presently known to all of humanity. I mean that some kid in Mali can know as much about some particular intricacy of proteomics as he is capable of learning because the information will be available to him. Back in 1993, Mark Endresen added and then removed annotation from Mosaic. This is pretty much how the history of annotation on the web seems to have gone down for the last 20 years. There have been various annotation platforms in the past where they've simply said, well here is my XML schema, here is my JSON schema and it's going to be locked up in my little platform and it hasn't worked out very well. So by following the architecture of the web we see that it's going to be, or we hope that it's going to be more sustainable, more interoperable and just better regarded future proof. This is a challenge that we think can be solved in a community if not by a single company because it's not about whether everyone has exactly the same technology, it's about getting people together to discuss things in situations like this and online asynchronously with the community group so that we arrive at a common understanding of what is required to be done so that everyone benefits from the interoperability of annotation. So we're trying to create consensus within the community as to a single way to do things even if it's not exactly what everyone is currently doing now. How do we try to solve this problem where in a thousand years from now we might look back and actually have a record of the conversations that happen around the things that are really important to us. The question is, is this T0? Is this the point at which the denizens of the future will look back and say presto annotation started happening in 2013? I don't know, but if it's going to happen then we need not to just build the technology and create the standards, but we actually have to get people annotating. We've got some early annotation technology enough that we could probably start using it to make some real annotations, but what we need to do is try to figure out what real people, real communities need in order to actually use it to do useful things and that's why you guys are here and thank you for coming. So a couple of things that are important from these kind of annotations and annotation systems is that the spatial relationships are very important and also in that I mean also that they are ambient, that they actually occupy space around us. And they require a lot of graphical elements and it requires a development of a language either personal one or a shared one that you, you know, this is critical to sort of develop a system that we all kind of understand and it's very difficult for people to enter into a book sprint once it started because they don't have the shared understanding, the shared language in shorthand and notation and annotation. First of all we need to provide a platform, a web-based platform where this stuff can happen. Stop doing, rolling your own and doing it in your own little courtyard. We just have to provide it at a global scale. So we think that a web-based portal that supports different roles, scientists can create their own annotations and reuse them but librarians in particular who are a trusted source of curated metadata are best informed and prepared to actually create the annotations that we know we need to use to incorporate in our system. And I've seen pretty amazing cutting-edge research being done in science but then when we write up that research the tools that we're using to write up the research is tools like Word and LaTeX for example that whether we like it or not those tools were not built for the web, they were not built for collaboration and what's even worse is that we're packaging that all that information when we disseminate science in things like papers in PDF that look a little bit like that so we lock them up in papers and the thing here is that if we want to change the way that we disseminate science we need to change the way we write science. These days most of scientists are still printing out papers and so their notes are on paper. Some are starting to use technologies and they have PDF annotators and some HTML annotators they use Twitter blogs emails and so if at the end of the month they ask you can you aggregate your own annotation can you give me an idea what you thought in the last month how hard it is to pull that together. If you've got all of the prior versions of the article, the pre-print, the draft, the poster presentation, the abstract and you can annotate that as a peer review process and see the annotations and see it in the final form and everything then that's clearly a much more powerful form of peer review than just annotate it or just peer reviewing the final fossilized journal article that gets submitted to the journal. Now all of a sudden so you've got a lot more access to unbedded materials and a kind of slightly haphazard well and this kind of invites some kind of comment or review streams in here that are almost reactive. One of the complaints of peer review is that you just get somebody who's listing a whole lot of snarky comments just to be clever or funny and I think what we're seeing is that if people are aware that perhaps those comments will be made publicly available on paper they're actually being a lot more pleasant. You know in the academic life those anonymous comments were really dependent on to get people's real feelings because they're so sensitive about commenting on their peers publicly egos are huge. What we're seeing is the more senior people in the field are happy to provide their name so often when the name is provided it's because they're a clear expert in the field that they're happy to stand by their word their reputation is made and usually it's the younger people still trying to make the reputation that aren't provided to the peer you know at some level if you believe the words you've written you should be prepared to stand by them and put your name by them. How much do you have to insist on people standing behind what they say their use cases for people needing to be private how private do they need to be some attributes are more important to some communities than to other communities you can't assume that everything is consistent everywhere. If we think about annotation as a really old thing that humans have done for a long time how does that inform the way that we move that into the current tools that we have to do annotation with addressing the persistent problem of how the interactions around annotations and around text can sometimes turn quite nasty quite fast in the anonymity of the web which kind of gets at some of the issues of reputation. We started talking about how the you could leverage existing in-person social interaction cues that we all sort of learn inductively and appropriate how some of those conventions could be moved and structured into the online space to kind of mitigate some of the the nastiness that comes from I can be really awful to you online and there's no social consequence or embarrassment that happens from that. When you cite something in a way you annotate it but you also validate it and I think it would be useful for us to have tools that would make more explicit the extent to which a citation is validating a piece of research that is all very implicit now the cultural conventions in science that when you cite something you're obviously not disproving it and so in a sense you're validating it because you have this sort of myth of you know I'm only reporting what's there in nature as opposed to I say this you have somebody else say they reported that for nature and so before you know it's a canon so I think we should collectively look at systems that make this whole trail of citation much more explicit what is it that the claim was what part of that claim are you supporting or do you need for your own argument so that you can trace it back to the original data that the original how do we get a universal shared repository of citation data that can be flexibly used by a broad number of tools so that it's appropriate for for all of the different domains the idea behind citations as they are straight what they should be having citations that take advantage of the digital nature of our medium would be great so one of the key things as we think about annotation and description of the life on earth and taxonomic description is a big problem that we've had for the past 250 plus years which is what something we call the taxonomic impediment within the community and the basic problem here is that we've collected all of this information over time we've filed it in file cabinets in books and in huge specimen collections in our large natural history and botanical garden collections where we literally have hundreds of millions of specimens that have been collected over the years the problem is none of this is linked together the data is often locked up in analog sources how do we share those resources how do we share the commentary the annotations that have occurred on all of this types of material over those years in order to better understand what we have on the planet earth and what we're losing but increasingly a large number of our use comes from other machines so again systems like tropacos at the Missouri Botanical Garden and Cyclopedia of life all of those types of systems and tools pull in this content so it's not just annotation that needs to occur within our own platform in that own silo but the annotations would need to move between the different systems where this content begins to flow around the the info sphere but I think that we're missing a big trick by not utilizing the web to communicate science as well as we all know in this room that we could and so threads is an idea which I think has the potential to help a little bit just a little bit so what is a thread like I said it's like science you take articles there's three articles there with components you pull out components and then you just publish those components in their own individual you published object this is the threat we probably already know this but the volume of human generated content is already vast and it's going to grow an automation supports all of those good things that we've been talking about so displaying facts making it easy to navigate the content we're interested in connecting hitherto disconnected but related content and the systematic analysis of vast corpora so ultimately it opens the door to new discoveries by realizing that potential of connecting the sum of human knowledge but to do that we have to automate what we are doing while the you while the users are annotating and typing the text is we are on the fly analyzing their text and we are proposing tags semantic tags this means the labels you see here like Mediterranean Sea Street of Gibraltar etc. are not strings those are links to Wikipedia and the user can accept or reject those tags so users essentially create positive and negative tagging relationships to Wikipedia which we use in this case while they're writing the annotations so there's a very precise way of getting unambiguous references to clearly defined concepts instead of strings on the one hand we we're feeling like there isn't enough participation people aren't annotating enough you know do we want to get as much community as possible involved in this stuff but I hope we're not doing that for its own sake because I think Mr. Barlow used the word metabolism and maybe that's a better way to back up and think about you know what we're doing and are we finding truth in all these things or are we just doing like pretty elaborate data structure manipulations which I think is a great thing to do but maybe we want to be doing more than that but there's more than just preservation ideally you want the data that is created by one group to be usable by another we should not be reinventing the wheel we should not be remeasuring the same data right now many groups say that they cannot reproduce someone else's research so they just do it themselves over again they reinvent ways of doing things so it's essential that data is usable by others so we can do better science and improve interdisciplinary work and really get better and help this kid and Molly get the next Nobel Prize I do think it's essential there that we create systems that allow the researchers immediate and complete control over their own research data but then these systems can excrete semantic metadata that can be shared with larger repositories so you allow them to titrate what part of their research they're exposing to exactly which part of the outside world another issue is the the matter of the fear of being scooped if people do something interesting with their data they're afraid other pipe people will find even more interesting aspects of their data and so we think that the key to this would be really looking at the reward systems looking at the way that research is funded and it's great to have some funders in the room to think of how can you move towards a system where you have a shared mission this idea of going to Mars or mapping the human genome where collectively you need all your data because you need everybody's brain power to work together and it's not just a competition who gets their first it's how do we all get there the bottom line I think is that research data sets are ripe for crowd sourced techniques for adding value such as annotation and this applies to the long-tailed data as well as to the larger more celebrated data sets a really good example is we can learn so much from what Zooniverse has done with their under the citizen science alliance from galaxy classification to planet discovery mining old ship logs and weather data crowdsourcing with consumer based help to conduct all sorts of science very creatively turning things into games that's all wonderful stuff what I came up with was it's an efficient rapid centralized scholarly communication system and I'd just like to drill down for a second to try and pull out things that are relevant it's efficient it's very low cost that means we don't have much effort to spend it's rapid one day turn around it has to be automatic centralized actually turns out to be quite important it is the one go-to place for scholars in a number of disciplines which means actually we have a nice opportunity to do things to help a community in one place and it's scholarly it's not designed for the great unwashed to communicate their ideas to the scholarly domain it's designed for communities to talk to a first approximation within themselves so with a view to look larger information environmentalism two final points it's important to capture the conversation which is what we're here to do it's important to measure rank and summarize the conversation fundamentals and needs to be real-time integrated as metadata mobile moves with the object and we'd like to have quantitative and qualitative summaries of the of these conversations now all of these are techniques that we have developed over time to translate so these technical core documents to people who want to get involved but it tends to be more one way than two way we get a lot of feedback in all sorts of ways we get people who send us emails we get tweets we get people who write blog posts people who just stop us and talk to us about these things but you we haven't traditionally seen kind of a collaborative response and opening up opening up a bill for annotation how do you if you don't necessarily have a community of people software code is a really interesting community because there's both a sizable community of people who understand code and who are inclined to engage about it online in a way that you don't see historically traditionally with statutes and laws so it's like what can you do so it's more dialogical where you like go out and to communities that you want to engage with and be like what are your objections and then like try to understand and distill their argument and then like rebut it and then when the document and then you end up with a document or a site that's like you know here are the other arguments so if someone comes to it from that group they can look at it and be like oh I see yes they're acknowledging the concern like so you know what we're trying to achieve through annotation studios to really engage students in the process of interpretation and of course that's done through close reading developing an argument making connections across text because you know texts have sources they had they have adaptations and looking at this whole ecosystems of how texts have evolved the fluid process the fluidity of text is really important to get a sense of what that means so it's also a shared and a participatory process it's not the loan scholar or the loan student it's part of a larger conversation that's going on so it's but it's also really reflecting on the process what do student students do when they annotate when they read a text so this also means making that process a visible rap genius is fun that's one of the things I feel like it's sort of missing from this morning I don't mean that as a criticism again I'm totally new here but the success of rap genius on some level at least for me was that I didn't know where any of these tools were that you guys were talking about before but rap genius was there and rap genius was fun for the kids and there's a playfulness about it and a kind of more relaxed and it's not that it's not scholarly on every page we acknowledge the top scholars of a particular of a particular text or a particular author but it's fun you can play around on rap genius really is the unwashed in a lot of senses it's it's inclusive of everybody as scholars not restricting that to any academic institution and inviting everybody to participate in this way and encouraging them to do so through some pretty fun features that I don't think distract from close reading and I think add to the engagement of the very same skills that everybody in the humanities and another classrooms is trying to get students involved in close reading analysis being informed the great aspects of this is that students that are if you want the wall flowers that don't participate in discussions in class have the opportunity to voice their perspective in this environment and so that way that can be brought into the conversation which usually does not come into the conversation that's these are really interesting multi-step processes that students have to go through but that's part also then of the pedagogy of that course so so faculty need to rethink how they actually construct them the assignments and think about about this multi-step process right they were read a chapter you know and annotated and be required to put an annotation online afterwards but then that would fold into face-to-face conversation and fold back onto the online conversation so and they would be interacting with each other at the same time online at homework you know one kid would annotate something other kid would come along and say I see it differently so it was time and space sort of collapsing from the teacher's instructor's perspective so how do we use annotations for teaching and then from the student perspective how do I use annotations for learning but teachers also can learn from the annotation so it's probably an iterative continued process annotation and education could be everything from well annotation as a pedagogy in itself right as part of a way of as you said motivating conversation getting people to read critically blah blah blah annotation could also just be a way of sort of capturing the exhaustive educational discourse probably many people here would say that it's pretty interesting to think about students and teachers obviously but really students taking that much more active role in not just taking information as a given but seeing it as something which they can really consider in light of their understanding of what the expectations are and then capturing that in some way right having that be part and parcel of the instructional process high school kids are engaged in maybe called something different maybe it's a more rudimentary version of it but the same thing that the college students are doing and the same thing the scholars are doing and the same thing that you're doing with your complicated manuscripts of you know medieval Europe or whatever you know breaking it down visually it's all the same process and you can call it annotation and maybe that ostracizes people but everybody does it you do it when you walk out of a movie theater right you do it when you read the newspaper you do it any time you look at something and you think something else yeah it's a very basic skill and it's that it needs to be taught across at different levels and into different degrees but we've got to not create barriers to having conversations with different groups and create tools that are restricted in terms of application we can innovate all we want on the tech side and give all the tools we want on the table but unless unless they're unless we help faculty imagine how they can use them and then and then let that grow naturally having faculty show examples of how they have been using the tools is the best driver for adoption this is the big one the ability to deal with the fact that the web changes people move things around and in general annotating dynamic content is really hard there is no you can't even work out what the even if you didn't know how to implement it you can't work out what the ideal behavior would be because if the content changes enough you can see that you probably don't actually want to keep annotating that content the New York Times had a fabulous example where if the content changed too much your annotation would just evaporate so this might be solved by multiple addressing fuzzy anchoring but I think perhaps the biggest way of the biggest or most likely way of solving this problem which is my last point Peter is persistent reference or to put that another way can we please put the Internet the whole Internet in git and for those of you who aren't overly familiar with get the important thing about get is it is an append only log of content objects and at the moment the Internet isn't a pen don't lead I can remove stuff from the Internet I'm looking at a man over here who's trying to build some pieces of this the fact that Tim says cool your eyes don't change is not enough the web is not cool enough your eyes do change and the Internet archive and memento are trying to fix this but I think that in some ways they're not ambitious enough I want to be able to say this is the thing that I annotated and know that that reference will exist forever that is the point if I annotate a journal article as a PDF those annotations should appear in the HTML version if I annotate a book on one page or on a Kindle that should appear on another page or on a nook and there are a couple of ways that we might solve this one of them is by having some kind of canonical representation one of them is by fuzzy anchoring if you like what I call content address annotation where you address an annotation by the content that it annotates we want to like build enhance books books with annotations and books scientific books that include data sets and much more things like that we need powerful open standards but when there exists a whole bunch of different standards out there Kindle I books have different capabilities they are authored differently you know they want everything certainly they want the metadata in a certain way and that's it and then once it gets popped into their system they're really the masters of that domain so I originally came at this from a developer I wanted to develop enhanced books I wanted to develop things that used enhanced books but when they come in proprietary formats with not only technological baggage but also legal baggage that makes developing novel applications things like annotation you know pretty hard definitely get the frustration because I'm doing that too but it's it's not it's not an easy terrain to navigate if what you want is to continue with the ecosystem that we have with all the different kinds of publishers we have then there's going to be some of this like capitalism exists part of our job is to talk more about what kind of targeting within books is appropriate necessary to really want to be able to target them or what what kind of target requirements are people going to want new books but now I point to paragraph X in a PDF copy of that same book versus the EPUB version where it's possibly a version that's authorized for a poem I have to do a fair amount of work to make that DUI useful so I think the trick here is that they're they're you want to identify functionality you want to identify some features and you want to say what these features functionality be present and as many of the annotation tools can want the individual tools to work on as many places as it can but we will recognize that they're going to be more than one tool it's not one tool it's all it's going more than one tool and it's going to have to sometimes make trade-offs with what function now the idea that we started from is that everyone is a journalist on the web that annotating is an act of reporting the problem and and and people are already empowered to some extent to do it by commenting but commenters are notoriously bad journalists I think that it's because the tools are bad that they're bad at it also something that we figured out is that text comments are overkill most of the time what people are trying to do is react to something on the web and annotations give them the ability to do that it would be best if the tool that gives people the ability to annotate had its own interface but it would also be good if you could use the same types of annotations to point to anything that you did on any domain a blog post of your own a tweet whatever and count that as an annotation to I think annotations are a window onto the soul and I think that annotations are important in digital media because they're a crucial form of interactivity in a medium where use is otherwise kind of invisible and people like ball yard in you know in the latter half of the 20th century became quite exercised about this the compact disc it doesn't wear out even if you use it terrifying it says though you never used it so it's as though you didn't exist gathering is another kind of annotation we don't think of much we usually think of being inside the document but what's next to the documents important to and we can aggregate the marks of lots of different readers and we I've tried doing this in a couple again in a couple different examples and the consensus of what's important is is significantly it's it's more there's more consensus than you would think I'm feeling a really strong sense that at its heart we all still wished that the web were more of a read write web you know if this if this community evaporates another community will come along at some point in the future and we'll take up that role I think what's important is we want to try and find a way to make the implicit connections between these conversations more explicit on the web but there's a much more complicated and subtle set of problems that you folks are dealing with which have to do with creating the ecosystem of meaning the metabolic sort of truth that makes it possible for that kid to have access to the good stuff and not be completely overwhelmed by the irrelevant and the untrue and I think that what you are doing and what you have the leverage to do even though you are a small group of people you are a very smart small group of people who are on to something that I think is the right thing to be on if we're really going to achieve that dream and I'm very grateful to you I think this is really hard work just just understanding what an annotation might be is not an easy thing and trying to come up with a system that replaces or enhances peer review in a way that gets us closer to an understanding of what is not an easy thing to do but I think that you are here and from all evidence I see you were doing it and I'm very grateful to you for that