 My name is Don Brower and I'm at the University of Notre Dame and then the very Family Center for Digital Scholarship, and I'm going to present how we organize and ran some student annotation projects last year. I'd to first acknowledge my colleagues who are collaborators on this. It was a team effort, in particular, Peter Cornwell of Data Futures, Nali Myers and Julia Vecchio, both from Notre Dame and Eric Decker at the University of Basel, all of them contributed to making this successful. So, over the last three semesters, we ran seven annotation projects overall some of these projects. We were planning some of these projects for a while but they were all prioritized in response to our institutions COVID related stay at home orders. We all suddenly like all our library student workers and also student workers from elsewhere on campus, either way to work remotely so that you could get paid for their work study positions, and these projects help to the bill and they also provided some useful scholarship. So we focused on what we call structural annotations. These are the kind of things that help others or help you find your way through a pile of images or to locate things that are used for further analysis. It's like some examples would be like finding chapter headings or locating diagrams or identifying math legends and so on. Sometimes even marking page numbers can be useful. And our projects ranged over a variety of corpora we have the epistemological letters which were kind of 1970s physics philosophy zine, in a way. We have some Vietnam government newsletters from the 1950s, a medium sized map collection and then we have finally the East Asian treaties, which had like 10s of thousands of pages of scan text. So, each of these projects we collected different types of metadata, and we and two of them have had the results pushed into custom and venue repositories for access so. All right, so to kind of set the stage is kind of what the just so we have something in concrete and think of. This is kind of what the annotation viewer looks like we have. We're using mirrored or two. And we have like the image in the middle with annotation tools in the upper corner and then we have these workflow links in the bottom left and that's actually I'll talk more about the workflow later so that's actually a key thing. So this is what annotation would look like. In this case we're identifying a title of a map. So in this in this annotation we had some metadata which is a transcription of the title and then we also had, we're classifying the page, the page type so in this collection, like there are road maps so we're trying to pull those out for example. And then here's another example of annotation this is from the physics zines, if you will we were identifying equations so. Okay, so let me let me start just go over the technology really fast. We, the technology is standard and as simple as possible. So basically items being annotated restored as imagery sources on the triple if server. We kept the annotations in a separate database kind of a standoff annotations. We used orchids to identify the annotators and to sign in, and then we used your door to instances, custom one for each task because we had to adjust the metadata collection and then we used the data futures and a store service to actually be like the workflow server attract each unit of work and and who is working on it and whether it's finished and then in some other workflow things. So, people overall we had many students undergraduate and graduate express interest we also had some people, some library faculty and staff and some other people like some, I think a high school student signed up in the. We had about 19 annotators from at least two different institutions and they made more than 10,000 annotations over the three semesters. And this subset probably wrote the super annotators and they seem to do like majority probably around two thirds of the annotations. So the, with that many people, we needed the one of the one of the two big things we had to do was keep keep everyone organized so to do that we made a live guide page and use as the home base all links and documents were put there organized by project we also put links to like the sign page and the other websites on there. We use a mailing list for communication, we try to keep regular communication on it so like weekly weekly ish status updates. And then we wanted annotators to know that there are people here who are actually helping support them because they can get lonely so we encourage them to contact us with questions and we also held virtual office hours and because for some people they preferred spoken or hands on examples. So, so probably the training was the one of the harder things our workers were at many different levels of ability and they could come and go as needed, or as a desired to address this we had a few strategies first, each distinct tasks had a document describing how to do the task, complete with screenshots and examples of what to do and what not to do. Sometimes it got we went, we went into extreme detail just to make sure people would do things as in a consistent way. We chose annotation tasks did not require expert knowledge and we try to give clear guidelines of what we expected, and then, and try to eliminate as many judgment calls as or as there could be. And then we also encourage annotators to ask us, reach out to us with any questions they might have. Like, they were going through, you know, especially in the treaties thousands of pages there are situations that we didn't expect, like fonts would change or, or the typography or the way things were organized would change so we would be often would have to address special situations. And then in those cases we would go back and update the tasks documents with and, and maybe even have QA, some FAQs and adjust those. So, for the most part we didn't have too many problems with, with QA, some projects didn't have much of a QA process maybe the project curator would spot annotations for others. There was much more extensive the project curator and maybe even with an assistant would do much more review possibly looking at everything. So this systematic problems, we would update, again we go back update the task documents, and, and often this will be like, like a case where we had a subtle ambiguity or there was a easy way to misinterpret something in some other way so we would, we would fix that. And then we would send notes in the mailing list to clarify what we noticed and address ambiguities often students are very conscientious about quality and some of you went back and fix their annotations after after we pointed out some of these so let's see moving on. So these are kind of example of kind of what the raw annotations would look like at the end like we have here like we're looking at from the treaties we have like a canvas identifier that annotation on the page. And then we also have like the metadata in this case a the transcription of the text a date and then the annotator who made that so another example of some annotations this case we pulled out these titles along with their, their sequence numbers this is like a two dimensional sequence number. Okay, so the other problem we have besides organization was just kind of keeping the work moving. We needed the annotators could really chew through a lot of the work, especially, especially when we had, when we when we had a lot working at the same time. So this was a constant immediate concern to find more work. This is probably the biggest challenge some. Some of the smaller projects, people, the anterior school work through in like in a few days to a week. I mean the larger projects were months so that was nice. And we want to work that was useful and not just work for work's sake so we actually we actually try to find things that were well tuned the annotations, especially the structural annotation. And things were where we already had source images ready or almost ready for annotation and we did not want to have to go through a digitization project first. And then once we did that we assigned it like a curator or someone who are internally who would, who would be I could be the person responsible for the research side of the thing they would find the goal and they would figure out what metadata would be useful and we would work with them to figure out what we need to collect. Sometimes this would be actually a research PI sometimes it would just be like a domain expert who understood the material. Let's see here. And then when we chose the task breakdowns. So we, you know, again we chose things that that that we do not acquire expert knowledge or or a judgment. The most important thing here was that we actually didn't broke things down into work units, we tried to identify. And here's to be able to finish a work unit in like, like five to 10 minutes was kind of the rough idea. So usually this would be a single page or a packet of pages so we would make like a sub manifest for just just a few things. And then our workflow, the Anastore service within give the annotator that work unit they would work through it and then they would click next. So that would kind of cement that and it would mark that work unit as finished. So it's a good way to get a lot of people to working in parallel without duplicating effort on these corpus. So, alright, so now I like to. Like, so that was kind of the house of what we did. And, and I'm going to discuss some thoughts about how well the whole thing worked out. So, okay. So it turned out using orchid was actually a very good idea. We can handle people from different institutions and we did have a few different institutions. We also had like I said a high school student so we have people who didn't have any institution and no problem. And additionally we had we had now we have a scholarly identifier persistent identifier for the person so we can we we associate that with the annotation so that way when they get reused we have a, we can cite everyone who contributed to it to them. So that was probably a teaching moment for students they could learn what orchids are. And then we also, it's been so it was actually pretty useful and we've used orchids as a sign method method for other digital projects we've been working on. We're very lucky to have we already had a prior relationship with Data Futures and this Anna store platform that was the key piece that let us have, have all this work, have everyone work in parallel and without duplicating effort as I said that was very, that was very important. We, we met our goal of providing useful remote work for a student workers. That was, that was probably one of the most important driving things for getting this done. This isn't me like this is like a year ago, very, and then on top of that, we also this also was a good moment for in the library we could show other people in the library what annotations were and how they were useful and and let and then and then we tend to actually offer this as like an internal service to other people and in fact we have a few projects already lined up just from showing people what we've been doing. So. Alright, so now going through the some problems that we were it was the first problem was, it's just it was challenging to find that work to keep people busy, because one of our tasks was to provide useful useful remote work that was, that was an important criteria that made that just just challenging because it takes a long time to develop projects. Now that the immediate need gone because we've been kind of going back to onsite work this that that made this is not such a big deal but that was definitely a problem at the time. Let's see the technology infrastructure works great once this setup but it was getting a set effort felt like it felt like we're dealing with just slightly more complexity than we could grasp, and then dealing with all the servers and identifiers and it was just just a lot. In addition, one of the things that that we would, I would like to see personally is more is more things involving a discovery or access to annotations like we can we can view the annotations today, I mean, and we have, and we can access to them of course we can also just pull them out of the database because it supports a WADM API, but we also but I want to be nice to see a better discovery or exhibit platforms for these kind of annotations. And then for next time. What I like to see is, or for our next projects which we, which we do intend to keep these going is definitely focus on community. Ideally, I think that's continue this people let people know how, like, get into the subject of what they're doing and also, because they're going to be going through all the images anyway they should like at least have an idea of how it's going to be useful and how and how they're contributing to research. Bring answers to research questions. I also work on the better way of tracking progress this might even be like a slight gamification type process maybe like a leaderboard I'm not sure but I think, I think have given me more feedback as as far as like what has been done versus and how much is left, it could be good for and then also another one is we definitely working on getting better like checklist in place in new projects and then in developing a vocabulary for describing what the annotation workflow would be so we can use the same descriptions from, like, from everyone from like the curator to to the developer. So, because, because, yeah. It seems like a lot of times we're just re explaining what we want, but in different terms because everyone uses a different vocabulary. So. All right, so this is my presentation I guess I can, I can learn a little, a little early so we could possibly go to see if there's any questions I'm not sure so. Yeah, thank you. They're really lovely reflection on, yeah, kind of nice project that got quickly driven by some, some urgent needs. There are a couple of questions here. I'll ask the first one, which is just asking how much work did it take to customize the mirror door annotation form. That's a good question. I'm not the developer who did that so I don't know but I think at this point it is, it is, it is, it is routine, but it still needs some effort so I think this was done by developer data. They, they have like about 30 projects or so usually we're going to turn around in like, in like a day or two, depending on their other workloads so. But no, that's a good question. In fact, I think a lot of these things when I look at it, I keep thinking it seems like there should be more, it should be easier to do something so this is definitely something that I would like to spend some more time on so. Great. And another question is just asking if you can give an example of a complex annotation scenario. So, not sure, unless you're going to interpret the word complex there. I suppose one situation could be, we have, you have complexity in terms of the amount of different types of metadata you would collect. We have, we have a few things where we did we did collect. For the, for the Vietnam newsletters we collected the table of contents, which among other things we had yet, they had people just trans, because there's like, they're like type written in a bad font OCR didn't work so well, we had people transcribe them, and we had the mark that the page number in a table of contents in like in bold, so that way we could machine process it out later like we could keep track of that so. I'm not really sure how us to track with a complex annotation would be. Okay, looks like I can see the questions. I'm going to move on to the next one here. So, so question is, great to see good examples of crowdsourcing, including paid anymore insights and to share about your experience. Yes, this is actually a very, very good thing and there's very and I think I think them being paid was was good for many reasons for it was good for them because they're definitely spending our time on this. It's so good as an incentive to actually get people to actually work on it like we noticed, we have we have noticed a drop off. This last, this last April as people start coming back on campus the number of annotators was going was going down so I do think that going forward like we will want to keep, we want to keep like a paid element or at least have people who will work on this, and that was a good thing I like I definitely like the crowd sourcing is kind of fun being bringing people in as you know the challenges making sure everything's a load seem common denominator. Another thing that we were looking at that we didn't do was trying was like because for annotations annotations have been great. Another thing we're looking at is using as something like the universe for things that aren't quite the level of diversity, but I but no it's been great, but like I said the biggest thing is, is I would do with crowdsourcing is communication, making sure everyone feels like they're like they're part of something bigger, so. That's, that's wonderful. I'm afraid I'm going to, I'm going to call it there, but that's the great news is that we, you know, you, we can kind of keep that chat going in the hoover platform and if you're willing to answer the questions that are there. We can start the next session now so we can start the next session I'm going to say a big thank you. Thank you to Don for for this presentation and provoking a lot of those good questions thanks everybody for joining us. Yeah, and we hope we'll help you see it at future sessions later this week. Thanks so much.