 Thank you. Thank you for the invitation here. I've been learning a lot. It's my first time that I'm at a conference that talks about annotation and I hope that not only have I been learning but I can contribute to some of the discussions about use cases, describing what we think would be a very good application for annotations which is the way curation happens in our field. So although I'm representing my project which is the astrophysics data system, I will describe much of what happens in astronomy as curation happens in different archives and then it's gathered together. I should mention that some of the things that we'll talk about have been kind of things that we brainstormed with my colleagues Michael Kurtz who also works in ADS and Chris Erdman who's the librarian at the Center for Astrophysics. So just one quick introduction about our project. The ADS is a NASA funded project. We're basically a repository of astronomy and physics papers. We index in our system both open access preprints like things from archive and the published version of this peer review from peer review journals, some great literature as well. The service that we provide is mainly to the end scientists who come to us to search and keep up with what's being published. They love to keep track of how often their papers are cited. They build the bibliography list. You know, they use authoring workflows like authoria to then integrate the bibliographies and create our papers. I put down some numbers but without going into detail it suffices to say that every working astronomer uses ADS and so with about 20,000 active astronomers in the world and our heavy users are over 55,000 so they go beyond astronomy into physics and other areas and we're indexed by all the major search engines. So we're one of the components of the archives that support astronomy research throughout the world. It's a very international research effort and what happens is that since we try to aggregate and link resources together a lot of people come to our site as a starting point for the research efforts. Different systems curate data sets and those are then referenced back to the literature and often the connections are created simply the type of curation that happens for data or claims about data is made in one of the following ways. It's the three examples that I have here. It may be in the form of being able to say that this particular astronomical object has been mentioned in a paper and the paper reports that the measurement of its redshift is 0.182. So this would be one of the astronomy object databases that collects this information. Archivists that maintain the actual raw data might be able to, might want to identify a paper because it mentions the data that was studied in the paper as being one of the data sets in their archive or it could be just some kind of semantic annotation that says well there's multi-wavelength data being discussed in this paper and it's about the interstellar medium. So all of these things happen post-publication. The publication process only captures some of the things that people would want to talk about in the scholarly discourse. So one of the things we do is we try to make it easy for end users to enrich the metadata. This is an example of one of metadata record. We allow people to submit corrections to this record and even though the basic metadata will not change we might miss a citation or something about this publication so the users can enhance the metadata in our system by simply pushing a button. The other thing we do is we try to expose the connections between papers and research data. So this is a list of results about the search that I made was weak, gravitational lensing, looking at the most cited papers and we have a tab on the left that shows that there's data products available from a variety of archives. So we were able to provide that as a filter and then as a way to link the papers back to the data products. Same thing for the astronomical objects that you see on the left. They're called Simba objects here. You can look at the full article details and you can see that we provide link to references and citations so other papers as well as full text articles and the data products on the upper right hand. So we do all of this by aggregating these resources and we then expose them so that they can be reused in other contexts. This is one of the applications on the Elsevier platform that uses our API to pull in basically data sets related to the particular paper that Elsevier published and so they're able to link them to the object databases thanks to the fact that we have aggregated that information and exposed it to them. The other thing that one of the reasons why this effort has been so successful is that I think because our community buys into the whole philosophy that making data free and open is a big win for our community. So I call it here the linked open data advantage even though it's not linked open data and I know there's people in the audience who have written this fact so it's linked and open in the general sense of the word. But so a few years ago it was shown in some of the papers by our colleague Rick White that actually the Hubble space data which is one of the greatest data sets in astronomy is reused at a higher rate than the new data sets that are taken are used. So archival data had in a particular year had produced 321 papers as opposed to 252 papers of the new observations. So well linked data and well curated data is used data which you're probably all new. Here then I reference all the recent paper by Heather Pivor about reuse of data in biology and our own study at the bottom of the citation advantage for papers in astronomy that have data linked to them open linked data sets available to them where we find about a 20 percent citation advantage for those papers. So up until now when I show these slides a lot of people have this general reaction that says wow you guys really have your act together you got it all figured out you're great there's nothing else that you can improve upon. And I'm here to say that no there's a ton of things that we're not doing as well as we could and there are lots of opportunities to improve over what we've done. So here's a list of problems and hopefully some of the clever people around here can help us improve on what we've done. All of the things they describe the collection of information that goes into creating these links are a result of often tedious workflows that are done by different archives and there's systems that have been built 15 years ago for the most part. In many cases it's librarians who do this and they really are focused on the task. They're not able to actually do the level of programming and enhancement of the system that we would like. So the curation is happens in closed environment. A lot of the data that is collected during that curation is stored in local databases. So it's never exposed to the public and it's hard to for anyone in particular ourselves to reach and expose it. So we mediate some of these shortcomings by trying to harvest and then link back to this content but it's not done in the real sense of link data sense because we can't really follow our nose and find out more information about all of this. Here's an example of one of such workflows. It starts with a literature search. This is done by the librarian, one librarian at the European Southern Observatory. The full text is scraped from a PDF actually. Entities are extracted. There's a triage of the whole process where some proposal IDs and observation IDs are identified. They're put in a database and then the database is locked away. All we have at the end is the list of bibliographic identifiers and data products that is pushed out to the public. But all the annotations that were done in the process are stored away. Here's another project called ScienceWise that allows people to do semantic tagging of articles in archive. And again here the user is allowed to associate tags from an ontology to papers. Again each user can do it and then this information is stored on their platform but not otherwise exposed to third parties. This is yet another case that makes me cringe even more where our colleagues in France that maintain a database of astronomical objects scan regularly new PDFs being created by the publishers. They have these beautiful tools that find using an extensive knowledge base all mentions of objects in the document. They actually create an annotated PDF with all these object names and then a librarian extracts the list of identifiers, puts them in the database and this beautifully annotated PDF is thrown away because they can't republish it. It's copyrighted material so it goes in the trash bin. It's true. So how can we fix this? Or here are some of my ideas. First of all we need to provide a platform, a web-based platform where this stuff can happen. Stop doing, rolling your own and doing it in your own little courtyard. We just have to provide it at a global scale. So we think that a web-based portal that supports different roles, scientists can create their own annotations and reuse them but librarians in particular who are a trusted source of curated metadata are best informed and prepared to actually create the annotations that we know we need to use to incorporate in our system. So we basically think that if we are able to, excuse me, enhance our search and incorporate basically ontologies and semantic annotations, this will go a long way to helping and then of course we want to make this available using the open annotations, make it available to third party so that it can then be integrated into other products and in particular back into the publisher's platform. So this would be something that is shared across the community. Why are we the right venue to do this? Well we have the full text of the current and past literature. Actually the major publishers like Springer and Elsevier give it to us for the purpose of indexing. We already have a system that supports tagging by end users so the creation of private libraries as we call them. We are developing tools to actually describe the knowledge in our discipline. We have an Astronomy Thesaurus which is machine readable cost file and there's other lists that we can use and we have a group of librarians that are very eager to participate in the process. So if we were to do it, this is an experiment which exists just to show you as a concept as a prototype what could be done is that we could track the flow of information starting from the creation of a proposal to observe a particular set of data sets to the end product which is a list of papers that have been published about this particular data set and then from it for instance generate a set of metrics to evaluate the impact of this scientific proposal over them. All of this you would get for free if we had everything linked together just as I described and this is the place where you can go to see this proposal, this prototype. So in summary, sorry for going over, so I think we have a very good platform where we can greatly increase the impact of all this activity. We have the right content, we have the community culture and we have the means to take this opportunity and make it better. Unfortunately there's no funding so far for any of this. There's only goodwill from the part of our collaborators but we think that if we can't do it for this field and then extend it in the physical science it's going to be just that much more difficult to do it elsewhere. Thank you.