 Good morning, my name is Marcela Mora. I'm a tropical botanist working at the Misuri Botanical Garden. And today, my colleague William Ulate and I are going to talk about the annotation needs of the botanical community in a digital library. So the Center for Biodiversity Informatics of the Misuri Botanical Garden has been involved with the creation of different online repositories, making biodiversity information available to researchers, students, and citizen scientists globally. So here we have examples of the repositories like Tropicals, Botanicals, Biodiversity Heritage Library, and the World Flora Online. Of all those repositories, they have in common that they contain taxonomic information. And for those who doesn't know, taxonomy is the science of describing names, of describing the species, naming and classifying living and extinct organisms. So the taxonomic literature has existed for over 250 years. And Karl Linneus was a Swedish botanist, and he is considered the father of taxonomy. And actually, coincidentally, today is his 312th birthday. So Species Plantarum was published in 1753 and was the first work in which the binomial nomenclature was applied. So it was because of Linneus that we now use the binomial nomenclature. For example, Homo sapiens is composed of two parts. Homo is the genus, and sapiens is the specific epithet. After that publication, botanists have published more than 1.2 million plant names. So thanks to the efforts of the Missouri Botanical Garden, and now we can have access to historical botanical literature. And thanks to a portal that is called Botanicus that contains 2,000 titles between books and journals, 2 and 1 half million pages, et cetera. Here we see how it looks. However, it doesn't have annotation functionalities, but it has OCR transcription. So one of the real use cases in which annotations are important or useful for botanists is represented here. She's a botanist, Sandra Knapp, from the Natural History Museum in London. And she was looking at a plant specimen and wanted to know who was the collector, but she couldn't understand the writing. So she asked in Twitter if somebody knew or recognized a name. And after a long search, somebody realized that the author was a CG treadmill. And if this name would have been annotated, they discovered it would be very easily. So although many digital libraries have OCR transcription functionalities, historical manuscripts still present a lot of challenges. For example, the uneven inking, the irregular orthography, the multilingual text, and the deficiency quality of pages to be digitized. So in some cases, the output is total gibberish. In other cases, even if the OCR is perfect, one single name can be found in many other names. In this case, Archival Byron McCallum is written in 17 different ways. So again, annotation, in this case, it would be very useful. Among taxonomists, botanists have a wider array of standardized reference tools. Here, for example, we have the International Plant Names Index in which you can search the data. You can search for the plant names or by the authors or for the publications. So if you want to search for Mangifera indica that is most commonly known as mango, you will find this information. So it tells you the family. It says the family is anacardiasae. It's also the same family as the poison ivy. The species is Mangifera indica, and it has an ID associated with it. And also, you can see that it says Mangifera indica and aneo. So the adult is an abbreviation that it has been standardized for a long time. And it represents the author is Linaeus. And you can see it was published in 1753. Also, the publication is also a standardized abbreviation. And the name of the publication is species plantarum. You know, it also has an ID. So taking advantage of all these tools in the past, the Center for Biodiversity Informatics, took those tools and conceptualized and designed different projects to annotate manually and automatically within a digital library. Here, we have an example of those projects, like science, gossip, mining, biodiversity, and purposeful gaming. However, when some of the products of mining, biodiversity, were evaluated by the potential users, they did not show a strong indication of whether the features were really wanted. So what we did was, we came with this idea of a project, a planning grant funded by IMLS, to try to look for the real needs of botanical users in terms of annotations. Because we, IT people, could come up with these great ideas that at the end, they really didn't see how they were useful. We already had an extension for one year. We presented here last year of what we were going to do, while we've done some of that. One thing that I want to mention was that we came with several hypotheses that we wanted to prove. We were thinking, maybe it's the tools or the technologies that are really not there, or the effort required to employ them outweighs the benefits, like it was mentioned in a talk before. Perhaps it's a matter of technology adoption. We even thought that some of the users thought, well, we don't know what to do it with, except what we already know. Or maybe it's even an age gap, or a digital gap in some cases. So those are the questions that we have. We also found out that it could be that it's not the taxonomists that are going to use those. So the taxonomists have the knowledge to annotate, but they might not be the ones who get benefits out of it. Most of them, as I said, couldn't recognize the value, or the potential value of those tools. They didn't see themselves dedicating much time to do it. Even during a previous test case with a tool that allowed to annotate for six months, we had it virtually was proprietary, and we had some issues with it. We had to take it out. But we found that there were some annotations. And as I said, they were not from exactly the taxonomists themselves, but the botanical community in general. And that's why we created this project. We did find some of the use cases, some of the things that we thought that are usually what botanists could do. But we really wanted to make sure those were the real needs. So we came with this project that was going to analyze the web annotation needs of the botanical community, and in the process, prioritize them, try to create a prototype to show how these needs could be dealt with, and then evaluate some of the tools out there to see which ones would attend the needs. The audience of this project will be, of course, the users of a digital library, particularly the librarians who are looking to improve their virtual library, enabling the users to add content to it. The developers who want to choose a tool to enable those annotations in their online solutions and digital platforms. And the botanists who want to enhance that corpus of the digital library collections. But I mentioned the knowledge. So we came with a survey and tried to get four deliverables. The needs analysis report, feasibility studies of the existing tools, approval of concept prototype, and the outcomes. So far, we did the survey. And I'm going to quickly go through some of the results that we had because we don't have too much time. We interviewed 40 members of the botanical and scientific communities from 10 different institutions in nine countries. We got a diverse representation sample with a lot of Latin American representatives because that's where most of the planned biodiversity is. Two-thirds of them were identified themselves as females. Half of the people had a botanical background, but we also had entomologists, ecologists, even a next lawyer who was doing actually annotations. And we included both. We opened up to those who are annotating and those who didn't think they were annotating, but they realized that some of the things they did were actually annotations. The optional part also allows us to see that one-fifth of them were younger than 34 years, one-fifth were older than 55. Another fifth was between the ages of 45 and 54, and two-fifths between 35 and 44. We had all sorts of profiles, not only the curators or students. We even have Wikimedians, project managers, professors, and so on. And we asked them, what do you annotate? And of course, they always say the specimens because that's what they do. They go and annotate on the specimens, but we're interested in the digital library. And we try to go and guide the questions into what other things do you annotate? If you annotate in the margin, if you annotate on PDF, and so on. They came up with the usual books, articles, and images. Actually, they also came up with photocopies and printed articles and annotating when they were reviewing papers, images of live plans, and so on. Chapters within a book, and now that sounds more like what we're trying to find from our digital library problem. We also asked them, what's the granularity that they were annotating? Some of them were the whole resource or the page level, text within a page, and one important thing, a region within a page. Some of these digital libraries, as you saw, are actually images that are. So the part of the region that they're talking about is sometimes important. They also came with other things on what level they were annotating. We also asked them, why do you annotate? And then came with these few, well, many annotation reasons. But we tried to analyze them, and it says basically and quickly, in work, they tend to share more. If it's personal annotations, they keep it to themselves. In general, they did have all the usual things, comprehension, recall, discussions, collocate things that go together, linking, peer review, and so on. But they also had things specific to the field. Georeferencing, like we saw yesterday, morphological features, habit descriptions. If it's a tree or a grass, that's very important for them. Correcting names, because names have changed throughout time, so they usually want to do that. We also thought, well, what in the process of research, when, what stage do you annotate? It came all over the map. So some do it at the beginning, some do it at the end. Any tool would have to be able to add annotations at every stage. We also asked, how often do you do it? Turns out that most of them daily, but some of them even hourly. They just keep annotating everything they want. It's kind of part of the job. Others were weekly, and so on. What methods do they use? And this is where it came, we had to open up, like I said. Well, the only thing I know is PDF, Adobe Acrobaton. Then we start asking, well, when you go to discussions in the clubs, in the journal clubs, when you get together, when you correct things, and they start opening up to other tools that they were actually using. They never thought those could be thought of as annotations. And we have Kindle. They like, for example, that Kindle shows the most highlighted parts, so it tells you something about what you're looking at. Discus, as I mentioned, WordPress, Sotero, it was mentioned before, and so on. Some of the things specifically to the field, of course the specimens and the labels, and they always gonna mention that first, but then some proprietary software to do annotations and microscopy photographs. Some of the actually infrastructure exists out there that allows them to do some sort of comments, iNaturalist notes for nature, transcription center from Smithsonian, EOL, and so on. And they also use some vocabulary in checklists, plant lays, worms, cattle lives, and so on. We also ask them, why do you use these methods? What is it that, why do you like it so much? And yes, that's what they were thought with. That's what they use, that's what they feel comfortable with. But some went on and talk about shareability, simplicity, flexibility, and so on. We ask them about those vocabulary, sort of existing lists. And of course they mentioned ones that Marcela mentioned. They even went to ontologies. Some of them would use ontologies and annotate with those for more high-ended processing of the content. And then we ask them, how do you use them? Well, some of them use them just once. Some of them just keep in there forever and never come back to them. Some of them try to come whenever there's a new project or a new use. We also had interesting, not only people who wanted to keep it private completely, others wanted to share as Domino Dei wrote it. And we even tried to find out if there was some process of reviewing the notation before put in the public. There was a pool, we felt, between the researcher who wanted to keep it because that might be the next great idea or because, oh my God, it's my responsibility. They're gonna see this comment that has nothing to do. So I don't wanna share that. Or the citizen sciences who sometimes feels like this is knowledge, we gotta set it free. So we ask them, who do you share that with? And the response is where a balance between public and private, as I said, four out of 14 said, no one, just myself. And they will never share that. The conclusion was that any tool would need to have functionality to keep them private, to share it in a group, share it with everyone. And we even found a fourth count which is shared with those people who have been logged into the system because they're people who are recognized, it's not the public anywhere, but you can also always know who's talking, who's doing comments, who's annotating your annotation. Do you read or see other people's annotations? Never, we never do that. But then, yes, actually they do, it happens, it's not that they're looking for it. But they do, I ended up saying, yes, we need that to be discoverable. We need to be able to overwrite an annotation or correct an annotation before it's done. And if we do that, maybe we have to handle version of some sort. About the process of betting or reviewing, as they said, no, usually we keep in private. In some cases, they do have, as we saw before, the editorial process review, which of course has a lot of reviewing back and forth. But they do have to have the option to be made private and change it to visible to a group and public and so on. What information do you put in an annotation? And as taxonomist names, habitats, corrected names, geographic locations, we know that, nodes, reviews, links, and so on, so on. Many of these are easily matched to the W3C motivations. And we're doing that. An annotation should allow for rich text, so you should be able to put text and images and write the link and the link is recognized so you can just click on it and go to where you're doing it. Ideally, you should be able to, if it's an image, go and choose between that many specimens, just the one that I'm referring to. How can your process be improved was our last question. And they said things like very easy to integrate with all the current existing software, implement search and order functionality of the annotations to create reports and things that would be useful and reuse them, previous annotations try to do some sort of configuration. How can your current annotation, well, they also kept saying recommendations. The sharing was important. That's again the tools, existing tools, Sotero and so on. And one last thing that they said was allow for a talk page. And some of the projects that we had, we have had a talk page which turned out to be very useful for the community because they share with others. Even the power users come out there and they start handling things that the tool doesn't have. They just find a way to cope with it if it's list of authors, list of artists and so on. Some of the things that we came up with, no one animal's logging, the privacy, support some sort of workflow for editorial process, store locally, but not only locally, always globally and optionally also a copy locally. And it's a default to see all the annotations but being able to hide them and filter them. And finally we did a prioritization of those needs. We came up with 40 requirements, 19 of must do, 15 that we should probably do, and 10 that we could, would be nice to have. About 15 assumptions and we even came up with five questions that we still have to figure out how to deal with. As I mentioned before, the fourth kind of group is someone who is registered. So it's not public, it's just anyone who's registered because then they know who is commenting or doing whatever they are. So finally we are in process of working for feasibility and Marcela's gonna help us with that. It's trying some of those tools we even heard yesterday of some others that we might have to be looking into. We'll be glad to look them and try them and against the list that we have, figure out what changes should be done. This project is an exploratory project so we wanna do a bigger project with partners and so on saying, okay, how do we get whatever tools, whatever exists, into complying with this so we can recommend our users to use that. And finally we did a prototype, we're still developing the prototype, trying to see how the annotations would work with images and how to handle those things that we are recommending and if it's possible or not. This has been stored in Rarum as a repository. Next steps identify requisites, requisites, best practices for the developments and so on. And of course, involve any partners that you or anybody else can recommend us and please contact us for that. I'll be... People may be thinking about lunch, which is our next step, but do we have any questions for these fine folks? People may be too hungry. Okay, well they will be here. Oh wait, we have one. So thank you for your presentation and I like how you built upon what you talked about last year. For the identifiers that are used in the different publications, have you thought about trying something with annotation to connect up with places that the identifiers might resolve to to show on top? Yeah, that's the whole point to leverage that. Leverage that. Because that way, if the tool could show and choose or look up for those, then we will already do a hard link. So that day, my column would not be a nightmare, which is just what you just find the right one. Yes. Okay, thank you. Thank you.