 You can all see that. You can all hear me. Great. So my name is Scott Nannian. I work for the Wikimedia Foundation. We publish Wikipedia, Wikisource, Wikibooks, not Wikilinks, not Wikitribune. And I'm going to describe today how we currently annotate content, broadly speaking, without using the fancy new W3C annotation tools and then hopefully what we hope to do in the future using standardized annotations as well. So first, let me describe our goal is to create knowledge, to allow collaboration to create knowledge without barriers. So there are three interesting properties of annotations on our platform on Wikimedia projects. First, neither the source document nor the annotations are static. This isn't write once or update with corrections once a year. Wikimedia and its sister projects get 10 edits a second. Maintaining stable anchors is not just a challenge. It's actually the key part of the project. There isn't a single author or editor of the base document nor of the annotation. This is a collaborative process. There are 30 million user accounts of Wikipedia of which over 100,000 are active editors who have edited in the past 30 days. There's deliberately no ownership, so there's no one person who owns any particular annotation. And not only is there a bizarre of collaborators, it's really a sort of bizarre of bizarre because there's lots of different collaborative projects. And what we're trying to do with building annotations further into the platform is to make it easier to build new types of projects on top of annotations. So there's no barriers. If you think of some new property that you want to annotate, there should be as little barrier as possible to you going off and doing that. So here's a short example with article content. This is article content in Wikitext. This illustrates how we annotate content for translation. Wikitext is our internal markup format. I'll return to that in a little bit. So this is just some content. The first step we do is to mark the annotated regions, which we do explicitly in Wikitext with these sort of things which look like pseudo-HDML tags. And then we have a translation UI which shows you those marked-up regions. And it lets the translator go ahead and translate each one. This will be translated into Italian, it looks like. And then, of course, because it's Wikipedia, the original article quickly gets changed. And then we have a UI to migrate those changes to the new version of the article. So first, including little details like does this change actually invalidate the translation? Maybe it was just a copy editing change, fixing punctuation or something, and the original translation can be used. And once you've done that, that notifies the translators to update the translation. So here are the important properties of annotated content on our projects. First, the annotation is generally transformative. It's some independent piece of content with a relationship to the base. And both require maintenance. It's not like there's a sort of parent and a less important child. They're both sort of co-equal. And the desired property is that the annotations is decoupled from changes to the base text because the base text is going to change. Further, the original author doesn't need to update the translation at the same time they're updating the base text. In fact, in many cases, they can't. They don't speak that language. The annotation is associated with a specific revision of the base content, although that revision might not stay current long. And there's an explicit migration process, which is slightly different for each type of annotation we support. We can't simply assume that the annotation is going to apply after the edits have been made, although we can provide good tools to migrate them quickly when the annotation does apply. And finally, as a cultural matter, we've found that it's important to make it very obvious when something is broken and needs to be fixed. So when an edit is made to Wikipedia, it goes live immediately. So if the edit is wrong, it quickly gets noticed and reverted. Similarly, if an edit invalidates the translation, we make that very obvious. Usually we display the untranslated text for that paragraph so that speakers of that language can quickly notice that something's wrong and make a change and update the translation. And finally, we're always concerned with annotation visibility slash invisibility. We don't want to ever insert too many barriers to people making edits to the base content. So we have this kind of infrastructure of stuff on top of the base content, but the key is none of them should interfere with each other. And in fact, there's some projects we haven't done yet because we couldn't figure out how to make them possible without interfering with the base content too much. And as mentioned before, there's lots of authors and lots of annotators collaborating. There's also many different annotation projects. So translation is just one instance. So here are some others on Wikipedia. These are all existing. So we have a system to convert between writing systems for certain Wikipedia between Cyrillic and Latin scripts, for example, or between simplified and traditional Chinese. And we use inline annotations to mark exceptions to the automatic conversion process. So this is the word for computer. This is written in different ways. If our I'm like translation, transliteration process doesn't understand it. There's an explicit annotation that goes along with that. We obviously care a lot about citations in Wikipedia, but our citations currently are just footnotes. They mark a specific point in the text, right? What we really like to do is mark the entire region that is supported by a specific citation. We support conversations between editors currently using the sort of invisible markup with the HTML comment syntax, which isn't great, but it's very necessary. It's usually used to explain, for example, there's a reason why in this article color is spelled consistently with a U. Please don't change it, because otherwise it just wastes everyone's time. And that's a sort of simplified version, you know, when you get into like political debates and other stuff. Those comments are really important, but they're not part of the reader's view. They're part of the editor's view. The WikiSource project is a project to transcribe scanned documents, scanned PDFs, and it uses its own ad hoc system for maintaining correspondence between the transcribed region and the part of the PDF that is a transcription for. And a lot of the social processes that we use just to maintain the encyclopedia use tags or templates in the Wiki text like the famous citation needed template. And so those can also be considered a sort of form of annotation on top of it, on top of the base content. So there's also new use cases that we'd like to do where we're constrained by the visibility of the annotation in the source document or just by the overhead. Currently all of our annotations are kind of ad hoc done in different ways that imposes a big barrier to entry to using it for new stuff. So for example, we would like to use annotations to represent proposed edits or approved revisions. Representing that in line in the Wiki text would be very cumbersome. We have a project currently underway to do media annotations on Wikimedia Commons where we've got sort of rich file formats, images, video, audio, that sort of thing. There are fine grain translation correspondences. So down at the individual, an article in Spanish Wikipedia is not generally a straight translation of the article in English Wikipedia, but there might be certain sections which are translated from one to the other. We'd like to maintain those correspondences closer and use them eventually to train machine translation tools. We have a project called Wiki Speech which makes our articles accessible to visually impaired people. We'd like to annotate words which require specific pronunciation hints in the same way that we do for the language conversion section. We would also like to incorporate more presentational annotations. Our content is being used in a lot more different formats. We'd like to sort of upgrade the visual look of Wikipedia and use pull quotes or fancy figure references for mobile. For example, there's a lead image which is displayed at the top of the page. We'd like to use some sort of annotation to mark which of the images in this article is actually the lead image. Whether it should be displayed full width, whether it's a portrait of landscape, all sorts of different tweaks. So here are some of the issues and this is sort of the reason I'm here and the things I'd like to talk to you over today and I'll be here tomorrow as well. Some of these are more or less settled. So this is the slide for things we sort of have a good answer to, but seem to be in common with things I've heard from the rest of you today. One is how to name the document. We both have a sort of name for the article as well as a name for the specific revision of the article. And if I want to sort of interoperate with other people when you're looking at the article, the URL in your browser is that top thing, but what we actually store the annotations are is that specific revision, which is something different. Where to store the annotations? We actually have a sort of a refactoring of our underlying back-end called multi-content revisions, which will make it a lot easier to store a bunch of associated content with an article in our database. And so that's coming down the pike. That's how we plan to store them in the future. But it means that potentially we give up interoperability because you can't ping the hypothesis service and see that there are annotations on this article. Also, how to anchor the annotations. So I showed you WikiText, which is still the way a lot of our power users use. But that doesn't apply directly to the rendered HTML that you see on the page. And we're transitioning to a sort of edible HTML which has a lot more of the metadata associated with it in line that's used to actually edit the content, like if there's a template included on the page where the template starts and ends and things like that. So we have a spec for that. So basically we'd anchor the annotations in all three of those ways, likely, to make them most useful for different uses. And then content types. Most of these annotations are not plain text, the things I just described. For example, just like the simple content translation one, it's WikiText, not really plain text. So there's some question of interoperability. If we just give this to you to display, you can see that there's an annotation here, but you might not know what to do with it. The question of how to migrate annotations between the revisions is sort of fundamental for us. And I don't think there's a one-size-fits-all model there. Different uses will have different sorts of tolerances for fuzzy matches and things like that. I'd love to collaborate on a more robust underlying layer that we can all share to sort of encapsulate some of the best practices there. Also, how to export the annotations for interoperability sort of at an API level. Other than being in front of a wonderful conference like this and saying, hey, there are annotations hidden inside Wikipedia. How do we make that discoverable and how do we make it easier for sort of multi-annotation browsers to sort of surface this stuff. And then ecosystem, what we'd really like to do is see where the fact that we're using a standard format will allow new things that we couldn't do by ourselves. And if we're storing all our content in WikiText, which no one else other than us can parse and we're storing it in our own backend and no one really knows it's there, we're kind of missing a lot of the ability to sort of build a better ecosystem around annotations in general. And so I'd love to talk to you all. I know a lot of people are interested in citations. That's probably the first thing there that we could build an ecosystem around. But we also have lots of media, people who are interested in just sort of the general process of annotating media. We'd love to sort of collaborate on that and make our annotations more visible to other people there. So that's all I have. Thank you very much. And I should say that all this talk's published. You can get it either from my user page on Wikipedia or we'll try to put the link in the agenda somewhere. And you've noticed there were a lot of hyperlinks there. So if you're interested in diving deeper into one of those use cases, you can click around or just talk to me. Thanks.