 So steal this presentation there's a URL you want to grab it an hour later I'm Dan I was involved in this travel startup that Joe talked about in the 90s we built an incredible company went from three to 600 people and would seem like a crazy amount of time and went IPO and sold it and everything and it really came out of that experience kind of transformed empowered to some degree but also really wanting to make a different kind of impact to have something that I would have much more lasting appeal than you know kind of through the exit and and you know would would reach out and touch people's lives in a more meaningful way and over some fits and starts that's led to this project which I will tell you about so here's a preposition for you a goal a question how how would we annotate all knowledge and probably the most important question that we'd want to ask is why would we want to do something like that right so what's first think back let's think back a thousand years to the year 10 2013 and think about all the really some of the really interesting documents and bits of human knowledge and product that may have happened since then this is the Magna Carta this was first that this was written in year 1218 and is the basis of out of which a lot of law and a lot of the principles of fundamental principles of human rights and and things like habeas corpus and protections against double jeopardy and things like that came out of in England but all we have is a document we don't have the thinking that went into it we don't have the first draft we don't have the edits that might have been made we don't have we don't know how many people necessarily contributed to it and we don't have any of the thinking that maybe came out of it later I mean yeah well there's documents that are historical documents that we the historians might point to and say you know this was a result of that and but we can't derive them by looking at this document this is a familiar one the Declaration of Independence you know a lot is said about the founding fathers people argue about what their intentions were what their what their meaning was when they wrote different pieces of this we have lots of writings by Thomas Jefferson and Ben Franklin and so forth and people that were involved in this and we try to into it what they might have meant but it's difficult to actually go back to the collaborative process that resulted in this and tease that apart from from the document itself communist manifesto interesting document has obviously been very significant in human history what was what went into it what came out of it what did people contemporaneous with this document in the 1850s or 60s think about it what did they say about it we have a little information but not as much as we could this is Einstein's paper original paper on the special in general theory of relativity published in 1916 what what are the peers that of Einstein's time back then have to say about this all we know is papers but we don't necessarily know the conversations that that existed around these documents so that takes us to now we have no shortage of things to annotate now we have laws we have scientific articles the news Wikipedia itself books data a mountain of information being created so if we start to annotate it how do we know that the annotations that we create the things that we say might be preserved into the future the web sometimes is very unstable as the years go by how do we create a project an organization an effort to capture these things and store them maybe for very long periods of time in fact time it turns out is really the defining design criteria for how to create the world's annotated knowledge this is Katarina fakes copy of Ulysses she's an avid annotator she marks up everything so this is just one person's copy of Ulysses so one of the other really interesting design problems is you start to annotate things it gets very crowded very quickly so how do we design for that in 1945 a very interesting guy named Vannevar Bush wrote an article in the Atlantic which is still on their website called as we may think and in it he imagined this thing which we call the web now he imagined this mechanical machine with which you would use to browse the world's knowledge in fact he created the first tabbed web browser it's tab A and tab B and he this article is very interesting you could please read it it's one of the most fascinating historical documents that really has brought us to where we are at and anybody in this room I assume is his profession and livelihood is directly tied to this guy's thinking and in the kind of towards the end he imagines this you know after having imagined the web and what it would mean to have all the world's knowledge linked together at your fingertips through these cool glass plates he starts to imagine what the implications of that would be and that a profession would arise of people whose sole purpose was to blaze trails through the world's knowledge and connect it together and link it not people that created the knowledge but people that would come along later and wreath and connect things and and share those trails and those linkings with other people so fast forward there's an interesting company that got started recently and last fall they raised 15 million dollars from Andres and Horowitz some of you guys may remember this well the really interesting thing was the blog post that Mark wrote about the investment that they made back in 1993 when Eric Bina and I were first building mosaic it seemed obvious to us that users would want to annotate all text on the web our idea was that each web page would be a launchpad for insight and debate about its own contents so we built a feature called group annotations right into the browser and it worked great all users could comment on any page and discussions quickly ensued unfortunately our implementation at the time required a server to host all the annotations we didn't have the time to properly build that server which would have obviously had to scale to enormous size and so we dropped the entire feature I often wonder how the internet would have turned out differently if users had been able to annotate everything so there's been a lot of projects that have come and some of them gone most of them gone since that time 20 years ago almost to the day and who people very well meaning projects who have tried to create systems that would allow us to start annotating all the world's knowledge and we basically still don't have that so question we started to ask in getting starting this project or is trying to imagine how we would solve this problem is why why still why we still don't have this so we came up with a list of seven reasons sure this is the short list there's a lot more reasons but these are the ones we think were most important no peer review model so there's a lot of noise not a lot of signal they weren't annotation based you couldn't necessarily some of them allowed you to to comment in line but a lot of a lot of them didn't and really no way to powerfully link to those annotations that were created no real focus on the cold start strategy so this is a tough problem getting the world to start annotating everything how do you solve for that not open not interoper will not based on standards not open source insufficient design to really solve for some of these problems and really short-term thinking this isn't a project that we are trying to solve for a year or two this is a problem we're trying to solve for a long time so hypothesis was a project that we started to take those problems that we found by interviewing a lot of the people that were involved in these previous projects and invert them into a set of design criteria to solve so hypothesis is a non-profit to enable the community moderation of the world's knowledge we are funded got a kick starter start but funded through grants from the Sloan Foundation Mellon Foundation and the Shuttleworth Foundation so our project our goal is really to bring two things together annotation as the first one and then peer review and a way to boost signal as the second so why annotation it turns out that annotation is one of the most powerful paradigms that we as humans have developed over time to come together and think about things collaboratively this is the Talmud a page out of the Talmud created orally thousands of years ago first written in the 500 approximately AD and what it is is the oral history which is in the middle surrounded by basically collaborative annotation over time by rabbis and scholars it's threaded there's off-page references all the really interesting stuff that we still use in places like Reddit and Hacker News today as a way to think about things this is peer review it's post publication peer review which is actually the paradigm that we're moving towards in science now with some of the new projects that are out there like peer Jay and Eli and plus and so forth so what is exactly is annotation annotation is like an arrow with a payload it has an address it has something that's being said something that's being brought with it that you are contributing and it has a target it's specific in the way that it points into the things that it talks about. Initiation has been done for a long time this is a page out of a treatise on optics that Sir Isaac Newton happened to write in the margin of and we kept this book because of who he was and what he said this is half of an annotation or two-thirds of an annotation it's got a target so it's located specifically on the page it has a payload which is what Isaac said but it's missing an address you can't reference it so that's the cool thing that we get with digital annotations and with the web and URLs and so forth is we get that addressability of the thing that's being said and that really unlocks the true power of annotation. There is a movement now called open annotation it's a W3 community working group which means it's not ossified yet as a standard but still actually has a very vibrant group small group of people working together to create it. It was really formed by two groups that came together over the last couple years a bunch of old scholars studying manuscripts that wanted annotations to be able to lay on top of images of those manuscripts that contributed and translated them and provided context and historical references together with a bunch of people out of the biomedical space who wanted to use annotation to weigh semantic content concepts on top of journal articles that could be interpreted by reasoning machines to understand when new articles would come out that would challenge the semantic relationships between say the relationship between a gene and a protein or something so those two groups came together and said can we come up with a common way to create interoperable annotations that we both could share and out of this is a draft a new draft of which just came out a couple weeks ago and it's called open annotation if you go to the W3 working site you can read all about it. There are already millions of annotations that are being produced today using this precursor's drafts of this primarily in the biomedical field where they're already starting to do these machine readable annotations. So what we get with open annotations we get the ability to point specifically to things we have can enable things like threaded discussion each part of the thread is a first order object it's simply an annotation an annotation of an annotation it's addressable we can chain references from one place to another like Vannevar imagined trail blazing our way across the web it's standards based and interoperable and it works across formats locations and different technologies. There's some interesting things that you get when you think about the different flavors of the way that you might create an annotation too so an annotation has a an address a body it points to something like a URI or URL but then it has what's called a selection so selection say you were pointing to an image the image was had a URL in the web but you weren't just pointing to the whole image you were pointing to like a globular cluster in a star field of an astronomical image from some telescope so you might actually use an SVG to draw a picture or a border around the cluster you were pointing to this so the selector might be the SVG laid on top of of your target URL so an annotation contains all three of these components but a comment like just a comment at the bottom of a web page doesn't have isn't pointing into the web page so it's just kind of below the fold so it's missing this piece a highlight is an annotation without a body we're not trying to say anything we're just selecting some text in yellow a bookmark is kind of like an annotation that just points to a URI so lots of different services that store things like this that are on the web today are actually flavors of annotation so when you tag something with SoundCloud you're actually just bookmarking a song that point you know maybe has a specific reference or if you're highlighting something with a website like awesome highlight or something you're really creating this kind of an annotation so annotations might be very versatile in their ability to serve different kinds of things the web this is the web that we have today when we point to things when we reference things primarily we reference them with URLs that point to the top of things yeah we have anchor tags but those kind of have to be implemented by the people that created the thing in the first place they have to have created those hashtag intermediary anchors very difficult for us to point specifically to a single character in a web page by using a URL that's universally interoperable so the promise that annotation brings is the ability to point inside of things maybe with that payload that comment that I want to contribute and that allows us to do interesting things like point many times specifically into the thing that we're talking about like as we make points or make an argument we can also point to things like video like point to a specific sequence of 20 seconds at this point where somebody said something that's interesting we can point to a gene you know using a gene browser we want to talk about this specific gene that codes for this protein and I want to lay an annotation on it so when somebody else is browsing that protein they can discover that maybe it's because a new paper was just written or somebody has an interesting data set that investigates that that ability to contribute thinking lets us also do all kinds of interesting things in that in that body in that payload we can we can challenge something that we're reading or support it or make a joke about it or provide a reference to a third thing that relates to this make a spelling correction we could suggest an alternate wording for the thing that we're pointing to like an amendment to a bill in Congress if we can aggregate sentiment of many annotations on top of the same thing maybe we can get a sense of what the aggregated sentiment is so maybe we can almost see like a heat map on top of different places where people maybe people with strong reputation or people that we trust tend to converge in their thinking about things that brings us to the second part of what we think is important was it's just the peer review component so how do we select for quality how do we design systems that let us like the squelch knob on a ham radio set establish a noise floor so that at least the things that we're looking at aren't spam or obvious trolls and then once we've established that noise floor how do we create a volume knob that lets us turn the most interesting stuff up so that surfaces most readily when we're looking at different places in the page we think this is a really fascinating problem and actually it turns out it's a whole branch of study in science and it's a kind of a blend of it's called reputation theory blend of game mechanics sociology math and statistics and so we had a little workshop last year and brought a bunch of people together to think about that and came up with a kind of reference design for maybe how to implement that and I won't talk a lot about that tonight but happy to later so what are some design challenges of the project so the first thing I just want to say is this is we don't have all the answers right we have a lot of question we have a lot more questions than we do answers I will show you some prototype stuff tonight but we need your help so if anybody thinks that this is an interesting problem please come see me we actually have an open rec for a full-time UX person by the way so here's some problems that we want to solve for high volume so the sticky note approach to annotation like you'd get with track changes and Microsoft Word doesn't work we can't just stick annotations on the page we've got to separate the annotations from the page disintermediate them so we can deal with volume so how do we do that from a design perspective on top of things like documents location and numbers so I kind of looking through a page how do I see how many annotations there are that there's a density that there's a statistically significant blip of stuff there and then how do I maybe understand how many things there are before I decide whether I want to look into them or there or not there's a lot of reasons you might want to annotate stuff you might it might be for personal research so I want to create notes that only I can see I might want to annotate in a small group like a class on top of a document and and make notes that other people could see or I might want to make something that the whole world can see because I have something that's that important to say so there's different contexts so we've got to solve for that problem fourth problem is one of the most challenging problems that people that have looked at this before have grappled with which is that the web changes a lot go to an article in the New York Times now that they're on the web they actually change all the art the articles all the time so there's a cool site called news diffs you can actually look at changes in New York Times articles over like minute by minute through the day sometimes these articles change 200% twice over all the text in them just in the space of eight hours it's actually astonishing I didn't know that or I would not have even believed that if somebody told me about it so how do you stick an annotation to something that's continuously in flux so there's minor things where you kind of want to still have that annotation stick there because the change is significant enough in terms of meaning or in terms of Levenstein distance then sometimes there are substantial changes to a document where you wouldn't want to resurface the annotation but you might still want to know that there was at one point something there so how do you deal with versioning in science a lot of times you have multiple formats for exactly the same things you have a PDF and an HTML version of exactly the same article well if you annotate the PDF it sure be nice if that annotation showed up on the HTML version well how do you how do you solve for that not everything points to a specific place on a page sometimes you just want to make a comment about the whole thing so how do you deal with stuff that's specific in nature and stuff that's not and integrate them holistically we really want annotations to show up wherever we're at on the web we want them to be there like Mark Andre's and built this into the browser that's kind of where it needs to be but there's also a case for there to be a website or a place that I can go see all the things that I annotated see what's trending stuff that's not at the website so I've got a balance between something that I want to be portable and with me all the time and something that's not sometimes I may want a an extension that I have in my browser so I can annotate things but what if people don't have the extension in the browser so how do you create the ability for people to annotate things where people don't have extensions and so forth how do I show and surface sentiment and reputation and how do I solve for the gold start so annotation should be everywhere we are so let me show you a quick little prototype of some stuff that we've been experimenting with and are about to release in a very early kind of alpha form so we're this is currently a book market we okay that must be like a hot corners or something going crazy on me so I'm at the nature's website this is an article pretty popular one on the human biome and I've got a little JavaScript I've injected into the DOM here this is via book market but we're about to package this as an extension it pops out as I scroll down the page as I scroll down the page I can see annotations that relate to different pieces of text so here is Stephen Jay Gould who's come back from the grave to annotate this particular sentence and if I click on this I can see the pull quote that he talked about and I can see that what the original annotator created that's associated directly with that text and then there's a threaded discussion that can ensue if I want to comment on this I can make a small contribution here and save that so this is this is an idea right we don't have the writing answers but we think it is a question of whether you even want flat versus threaded comments I mean one of the web's biggest like idea wars talk you know get Jeff Atwood on one side and I don't know who you'd get on the other side maybe me and and we could sit here and argue about this for hours so we chose initially a threaded model simply because Fred allows specificity and so if we're already predicating this whole project on the basis that we want to be specific and particularly when we want to enable tools for things like scientists we think a threaded model is an interesting place to start it's the hardest model conceptually from an architectural point of view to implement and we have a view of this that we're tinkering around with that actually flattens this from a threaded view to a flat flat view so maybe you could toggle between them but interesting open question for us is is this an interesting approach you can the panel usually can be drug up out in back so you can make a little more room if you want we're also experimenting with how to do collapsible threads so if it gets really deep 10 deep you start to get everything shrunk up against the right hand side of the frame maybe you should start collapsing threads past a certain depth to give you more room as as you scroll through really deep conversations so those are just some some different ideas everything here is this is actually in beta it's not on this test site yet but everything is linkable so every single thread we'll have it you be able to expose a URL to that thread put it in a tweet have somebody click on it and take you right back to this page and showing showing that annotation on top of it so that's something that we think is important to kind of allow the bits and pieces of conversation that are evolving around a certain thing to be able to be shared with other people so let me go back to my presentation here I'm gonna go this way if you have a real quick question about this cool but I think there's a bigger question period at the end so that's kind of I had some static slides for this but I've showed some of this creating an annotation is as simple as selecting some text we want to contribute our thinking is a threaded view we want to be able to expose links to things we want to be able to stick those links in tweets all this is open-source software so you see there's a hypothesis URL here but if you want to run your own annotation store and have the links that your create people are capturing into threads be coming from your store of annotations like for instance say you are your IBM and you want people to be able to annotate corporate documents against a store of annotations that's behind your firewall that people on the outside simply can't get to then you might that URL might be from an internal resource as opposed to our service for instance everything should be embeddable so you should be able to take an annotation and stick it on a card and embed it in a blog and have the things that are being talked about and pointed to be live and able to be accessed through URLs the the change thing like how do you deal with changing text is really interesting so we've collaborated with some people there and we're working on a library that we think has it takes some of the thinking that people have come contributed to over the last couple years and combines it into a new library so say I want to link to the a piece of text like this one and the text that I'm annotating is going to change like there's a spelling error there so somebody comes along later and correct that spelling error so things I'm pointing to has actually changed from from where I originally annotated it how do I create a sticky annotation that's that still points to that well this is a pretty simple example because only one character changed but what if different characters change what if this was a you know more complicated example it actually turns out to be one of the stickiest problems out there so we've we have an implementation that uses a prefix and post fix and selection approach one of the things we think about is important about annotations is that we should be able to come up with an algorithm for creating robust anchors to annotations that would have worked on documents a thousand years ago so we don't want to use things like XPath or DOM related ways to point into text because first of all they didn't work between formats and second of all if we move away from things like XPath in 10 years or 20 years or 50 years then our annotations break so how do we create something that's purely textual and contextual that that solves this problem so we have a fuzzy anchoring algorithm that lets us lazily attach context wings and then come to a conclusion about whether the thing that's in the middle is close enough to what we originally annotated that we want to continue to surface that annotation we should be able to survive at it so let's take imagine we take that paragraph there which and the our co-author moves it to the bottom of the paper that we should still probably stick that annotation to that same paragraph so we need an anchoring approach that allows us to do that like I said we need to solutions that let us annotate between documents that may be identical except that they're in different formats so we need things like concepts like canonical targets that let us create a common reference points between different formats we need to solve the storage problem so we're in the have a preliminary agreement but we're in the middle of structuring an agreement first with the internet archive to back up all annotations that are being made we also are discussing the concept of sending them pull request every time a page gets annotated so they can take a fresh snap of it so we actually have a version history of the page every time it was annotated obviously that only works for publicly facing pages that don't have robots that text turned on but so we still need good robust anchoring strategies for the rest of the web but longer term we probably need to solve this problem using different strategies you know single copies of things are very brittle so maybe we need to start looking at a truly federated approach using you know strategies like BitTorrent DHT or other ways to create kind of a holographic image of what's being annotated and the last thing I'll say is the cold start problem for us is maybe the most interesting and the most challenging of anything that we're looking at so we just announced a workshop here funded by the Mellon Foundation to bring about a hundred of the people that are most focused on finding ways to annotate the knowledge in their communities together to help us think through how to solve some of these problems so people in the sciences people focused on legislation and the law journalists to kind of share annotation projects that they're working on new technologies that have kind of come out things that are building on top of open annotation but also to share perspectives from their community about what they need as users if we are to think about tackling this problem together so this is the beginning of our thinking about how do we solve this cold start problem so that's it that's that's the end of the presentation. Great thank you Dan.