 Let me welcome you to the final project briefing of the first week of the CNI Fall 2020 virtual member meeting. I'm Cliff Lynch, the director of CNI, and just a couple quick logistical things before I introduce the speakers. This session is being recorded. The recording will be available later. There is closed captioning available if you'd like to turn it on. We do have a chat running. Please feel free to use the chat as the session proceeds. You're welcome to use it to introduce yourself if you'd like as well. We have a Q&A tool at the bottom of your screen and you can use that to queue up questions at any point during the session. We will deal with all the questions after the presentations in a question and answer session that will be moderated by Diane Goldenberg, heart of CNI who will beam into existence when the presentations are done. I think that's all of the logistical things I want to cover. We have two speakers with us today, Alyssa Guzman and Albert Palacios, both from the University of Texas at Austin. They are going to tell us about a very interesting NEH funded project that they have underway. There are aspects of this that remind me a bit of certain kinds of crowdsourcing projects related to archives, but there also are some very important differences. There's also a fascinating multilingual aspect of this which I think they're going to fill us in on. I'm really looking forward to hearing about this. With that, let me just thank our presenters for being with us and I'll turn it over to Alyssa who will start the presentations. All right. Thank you and good afternoon everyone. Like Cliff said, I'm Alyssa Guzman. I'm the Digital Scholarship Librarian at UT Austin and I'm here with my project co-director, Albert Palacios, who's the Digital Scholarship Coordinator at Leela Spencer. We're here to talk about our project which is called Enabling and Reusing Multilingual Citizen Contributions in the Archival Record and I will explain what that means as I go along. Our work centers around a piece of software called From the Page and From the Page is software for transcribing documents and collaborating on transcriptions via crowdsourcing projects and there are many libraries, archives, museums, and independent scholars around the world who use From the Page to do simple text-based transcription projects or even create digital scholarly editions and the two links that are on the screen here point to our UT library's installation of From the Page. It's open source software so we have our own installation but Broomfield Labs who are our collaborators on this project also have their own installation of From the Page that they run as a software as a service so if you're interested in using From the Page without setting up your own instance you can access it there. Okay going to show you a little bit about the interface of From the Page with one of our transcribed manuscript pages from the Benson so the From the Page interface has a window here where you can zoom in on the document especially with these older handwritten documents it can be a little bit difficult to read and then there's a pane over here that shows the already transcribed document and these words in red are place names that have been indexed so if I were to click on this it would show all of the places that Oaxaca appears in this document. To participate in a transcription project that's live anyone can click transcribe and then the live text window appears and you simply type the transcription into the window over here and at the bottom of the page there are some transcription conventions which can be updated for each project and there's also a translation feature so if we were to translate this document into another language there's the translation pane and you can look at either the transcribed document or the image and there's a nice dictation feature if you want to just read instead of type so that's a little bit about what the interface of From the Page looks like moving back over to our presentation. All right so in our project we decided to use the term citizen contributors to refer to the participants in these crowd sourcing projects oftentimes they're anonymous and frequently they're not affiliated with our university at all but they do provide meaningful intellectual labor in the service of these projects so they often have subject expertise around the documents they're transcribing or language expertise or even handwriting reading expertise so for our project we have two main goals number one is to break down barriers to access to our Spanish and Portuguese language collections for speakers of those languages so most of the interfaces for the software that we use are in English and we want to be able to collaborate with our current Latin American partners in the interpretation of their digitized text that were digitized during our post custodial archiving initiatives but are hosted on our servers so some of the considerations that we're making as we undertake this work again we're using From the Page to work with citizen collaborators on these crowd sourcing projects translating primary source materials our end goal is to be able to reuse these transcriptions in other systems and projects so for example exporting the text of a transcription and ingesting it into a digital asset management system to enable full text searching in providing corpora for text analysis or even simply providing access to hard to read handwritten materials we recognize that this transcription work requires thoughtful interpretation and decision making it's not simply filling on a form it's not a road task we're situating it as an act of scholarship and we recognize that our citizen collaborative collaborators though they might not be affiliated with the university possess meaningful knowledge that they're contributing to our projects so we believe that they deserve to be credited for their labor we have three sets of deliverables for our project the first is to internationalize the interface from the page into Spanish and Portuguese and that work has largely been completed and we hope that the workflows that have been established can be used by other organizations who are interested in internationalizing from the page into different languages for their communities so I believe there's some interest in translating it into French possibly German and Japanese we're also introducing faceted browsing features so as a relatively large institution we have some very large collections of documents with many pages and it can be difficult to find things within the interface of from the page so there will be basic metadata ingest and then the ability for collection owners to create custom facets and then finally since we're really interested in the enabling the reuse of these transcriptions and translations we're looking at the export features of from the page to make sure that we're exporting documents in formats that can be ingested back into our digital asset management systems and then we have several and we're also looking at exporting the attributions for our collaborators so that they can be contributed in the metadata we did a user study in the spring of this year where we surveyed and then did follow-up interviews with from the page project owners to learn about the challenges that they're encountering in doing this same work and some of the challenges that we identified were things like determining what actually counts as a meaningful scholarly contribution to a crowdsourcing project so is transcribing a single page enough do they need to transcribe an entire document is participating as a reviewer or an editor does that count and then challenges in actually gathering identifying user information since people can contribute anonymously there's not a way to gather their user information but for people who do decide to create a user account what type of information do you want to share about them there's now a field in the from the page user account where people can indicate what name they would like to be credited under and then we encountered some issues around privacy copyright and the use of these transcriptions and translations so making sure that you are communicating to your project participants about the type of license that might be applied to their work so for example kind of applying a blanket Creative Commons license to all of the contributions to a particular project and then finally the workflows what do you actually do with the products of this work how do you get it out of from the page and what types of systems are you exporting it into and that varied significantly across the board between the different types of institutions these colors we interviewed you can read more about our findings we will share them more broadly with the other from the page project managers and you can find them in our institutional repository Texas Scholarworks and now I'm going to turn it over to Albert who will talk about the internationalization thank you Alyssa so I will be going a little bit more into depth in terms of current work as well as future work for this project so the first part here or some of the work that we've already undertaken and we're about to wrap up is the internationalization of the platform itself now most of this work has been done and I want to give due credit because that's something that we're really focusing on our grant project give due credit to Joshua Ortiz Paco who was our graduate research assistant for this project he undertook the translation of the platform into Spanish and into Portuguese alongside with Broomfield Labs Ben and Sarah Broomfield who are the developers or from the page so together they were able to essentially have created a shell for the platform itself with dictionaries that are in English and in Spanish as well as in Portuguese that enable the user to toggle between them whenever they're using the platform so a lot of the work that has been done in the past couple of months has been the translation work of the platform itself and so as I mentioned they created a shell right and these dictionaries are fitting into it and as Alyssa indicated there's interest in creating additional dictionaries for other languages to enable users throughout the world to be able to contribute to collection materials that are often not in English particularly the case for the Benson Latin American collection here at the University of Texas at Austin so what you see here on the on the first screen right here on the on the slide is a translation into Spanish as you can see the tabs have been translated as well as the navigation view at the top and here you see it in Portuguese again the same controls have been translated into that language and that includes not only the tabs but also the navigational buttons as well as the the text that is consistent with the platform itself anything that is associated with particular projects that is dependent on the collection owner or the collection administrator or the project administrator they are in they're enabled to provide those translations for each of their projects here's another view of the workspace that Alyssa demoed earlier on today with the translation in Spanish now current work as I mentioned we're wrapping this up we're just finalizing some of the random words that we didn't catch in the first place space but current work we are doing a couple of things the first thing is we are focusing on and by we it's a team of metadata experts at the University of Texas libraries as well as room field labs trying to figure out how to bring in metadata along with the materials upon ingestion currently the way that the system functions you're able to bring some metadata in most of the time you have to provide it on the platform this would enable us to upload a CSV of the metadata along with the assets to to automatically populate the metadata for each object in the platform so here you see an example right of how we are bringing in a variety of metadata fields we're working with both mods and Dublin core there is going to be flexibility though within the platform for the collection slash project administrator to be able to not only do some renaming of the metadata field names but also be able to hide as well as show particular metadata in the platform part of that work one of the reasons why we wanted to bring in the metadata was to also facilitate the browsing for our users currently there's only search fields that are available for each collection where users can look for particular keywords of course that's dependent on the metadata being there as well as some of the transcription work already starting since where the metadata being ingested our hope is that users will not only be able to search that metadata but also be able to filter the collection and browse the collection through filters that they have been creating this semester so as you can see here there's an example here of how a user can navigate through location county and these are determined by the project slash collection and administrator they decide what will be filterable in a particular collection so that's work that we are currently developing for this platform also what we're doing and this is in preparation for the deployment phase of this project is we are developing user guides and by by we it's primarily our incoming or relatively new graduate research assistant bryce mclean he is taking our user rights are our current user guides in english expanding on them based on the new features that are being introduced and once they're fleshed out and illustrated with screenshots he's also going to be translating them into spanish and portuguese and this will come of use as i mentioned in the next phase when we start working with our user community communities now future steps and this work will begin largely in january february so it's coming up relatively quickly we will be first off we're going to be deploying the tool with our communities part of the impetus as elissa mentioned early on is we have been working quite a bit with international partners uh land american partners in particular through our post custodial initiatives which we call here land american digital initiatives this is the landing page for that international collaboration um and so part of besides making their collections accessible online we also want to create uh we wanted to create a workspace for them to be able to transcribe and enhance the accessibility of these materials for not only for searchability for their own communities but also for usability within their communities and so from the page this is really what you know we're hoping to leverage the potential from the page to be able to not only enable but also empower these land american communities to work their materials and really be able to build them build on them efficiently so we are working primarily with two partners that are part of this land american digital initiatives the first one is it's an archive that is in puebla mexico it's called the royal archive of chulula it's predominantly an indigenous archive that was created from the 16th century all and it dates all the way to the 20th century and this is a project that we have been working in collaboration with faculty at the meritorious autonomous university of puebla specifically dr lydia gomez carcia now part of the deployment uh the deployment place would be that our partners will we will ingest some of these materials into from the page and our partners will not only test the site for its functionality but also assess the language accuracy of the navigational buttons as well as the platform generally there will be transcribed as we're transcribing materials are going to be providing us feedback on anything that needs to be corrected anything that might not quite make sense in terms of the language and as well as providing us feedback on bugs that they might encounter throughout their work and so as i mentioned the first one's going to be the university in puebla and that one is going to be a class of undergraduate history students who are focusing on paleographic work the collection that they're going to be working on is mainly 16th century 17th century so they're going to be doing some paleography there um the second community they're going to be working with is with uh acone um and acone is based in brazil um and uh it stands for equipa de articulação asociarias as comunidades negras do vale de libera um it's a community organization that actively fights for land rights for afro descendant brazilians in uh in sao paulo the state of sao paulo primarily now with them we're also going to be working on some of their documentation handwritten documentation to be able to test out the tool for again linguistic accuracy and functionality we're also um and this this is likely going to be coming after their transcription work has been completed we are going to be developing uh and finessing the from the page export features again our goal was to uh not only enable our partners to build on their own collections but also be able to reintroduce uh their contributions back into the archival record uh not only the contributions themselves but also include them and acknowledge them in the archival record as contributors to this intellectual work um now in terms of the features we uh we conducted the user study to be able to get some feedback from the from the page community in terms uh to try to see what um formats as well as uh file types would enable reuse as well as reintegration into the archival record so um in collaboration with our metadata team at ut libraries we're going to be not only deciding on the outputs but we're also even be looking at the structure of the outputs so that we can best integrate them into our systems and uh so this feeds into um so what you see here is is a small representation of the different repositories that we have here at the University of Texas at Austin libraries um each one of them each one of them has a different um technology stack and so part of the collaborative work with our metadata librarians is to try to figure out again what the outputs have to be and how they have to be structured in order to be able to feed them into the systems part of that work will also be based on how do we acknowledge the transcriber in the archival record so that we can ethically attribute the work and again in the record and that's pretty much it you know in terms of the upcoming work for this coming year this slide here we really again it's this is a project for most about acknowledging acknowledging a collaborative work and so here you know we list all of our partners in our project team as well as the advisory board anybody who has been providing feedback and work to this project and last but definitely not least we want to thank the national endowment for the humanities for supporting this grant project thank you thank you albert thank you elissa for that really interesting presentation on um this really fascinating project um we really appreciate you coming to cni to talk to us about your work and show us how it's coming along it's quite fascinating i'd like to open the floor now for questions um any attendees who would like to ask a question please feel free to type it into the q and a and we'll be happy to take that now um i'm interested in the work that you're doing with your partners um logistically how is that likely to play out sounds like a sort of a complicated process can you talk a little bit more about that sure um with the university of puebla we've actually been working with them with this platform uh this is going to be the 30 or that we've done it so typically um in january we upload a batch of documents up to the into the platform and i lead a training in spanish for the students um and throughout the semester i support them as needed um so for our public partners it's it's an established relationship for our coordinate problem uh partners or brazilian partners this is the first time they're going to be doing this and uh this is something that we've had to we we first we didn't quite foresee but there's an issue of connectivity uh so this community in particular uh they for the most part do not have internet access um there are a few members that do and so we've had to coordinate with them to see who's going to be able to contribute to the project considering the connectivity issue um luckily we've been able to um address it for this particular pilot but i think that's something that we need to consider um for the remaining year right try to see if we can find a solution for um our community partners that have that that you know the connectivity issue oh right yeah that's a really important point elissa did you want to weigh in um i just to say that the pandemic has kind of affected the way that we plan to deploy this work um you know we had some travel um planned and i i don't think that that will happen but um thankfully albert and i are both working in the realm of digital scholarship and i wouldn't say it's been a seamless transition to working remotely um but i think we're we're ready to take on that challenge now yeah good good point both of those i just want to read a comment here from um liz gushy who writes uh liz uh elissa and albert thank you so much for this work it will be such a useful model for us at the university of miami especially given our depth of archives for cuba and latin america so that's great thanks thanks for that comment uh and now i see that cliff has a question so um over to you cliff okay um so here here's something i'm wondering about um i've seen two kinds of crowd sourcing projects over the years one kind really opens a resource up to very public crowd sourcing and in order to deal with it at scale it either does some kind of redundancy for example you get at least three people to transcribe the same piece and then check them to see if they're this the same or more or less the same or you put a human reviewer in the loop um or um in some cases you build statistical models of the behavior of people and build confidence in them over time the other um kind of model is where you pick your um crowd in some way for example i've seen projects where they've enlisted um classes of undergraduates and you had mentioned uh example of uh you doing something along those lines for a collection how are you thinking about the whole question of enrolling and qualifying um and quality controlling your crowd source participants that is a great question um and i think so i'm the service manager where from the page installation um so i see the way a lot of our collection owners are using it um and it kind of varies from collection to collection so i'll let albert address our community partnerships in a minute um but i've seen certainly the you know the faculty member using it in the classroom with their undergraduates for example we had a faculty member who wanted to use some archival materials and they had an in-person kind of um assignment setup but it was actually starting to cause harm to the collection over the years so they got it digitized and now the students work with those materials and from the page instead of handling the physical materials and causing further degradation to them and i also can hop back over to um the live from the page demo because i want to point out the um the version control that's available in from the page so every um edit that happens to a page in a document gets recorded in the history um and you can roll back to a certain version um so i actually do some amount of um kind of spam control on our installation of from the page um we got hit in the summer of 2019 with a lot of fake accounts um and from page dot com got hit and so we worked with ben and sara um and they developed some kind of uh site level management tools for managing that kind of like malicious links or people just kind of posting um or like bot text and things like that and the the version control makes it really easy to roll that kind of thing back um but do you want to talk about the projects that you've worked on yeah in terms of recruitment um we've tried it all uh it seems for for us at least the more um fruitful avenue has been to be uh recruiting targeted audiences in particular uh we do a lot of partnerships with faculty and researchers in lat america uh we do a purposeful um uh either educational or research oriented initiative with them um and that typically has been uh successful we've tried you know the the transcribathon model um we've we've uh we've done it twice you know with varying successes one time we had a dedicated cohort of uh paleographers or either three or four of them but they probably produced quite a bit of work within the month's window uh we did a time limited thing to kind of uh empty the urgency of it and that was productive too um but we found the most productive sessions and meaningful engagements to be um collaborative sessions with with uh faculty and researchers in lat america for us that's that's really interesting thank you um that that's very helpful thank you thanks thanks claire for that question there's also um a request again from liz gushy um asking for your um the doi for the publication you mentioned earlier yes um i'm going to stop sharing my screen in order to get that link and i see we're a little bit past the hour here so elissa while you are looking for that i'm going to um there it is thank you so much for it's not a competition now thanks thank you so much to both of you elissa and albert for coming and talking to us about your project we'll look forward to seeing how it progresses and um thank you so much to our attendees for making time out of your friday to hang out with us here at cni uh we will be back again on monday where cliff will do his wrap up of the week's project briefings before we launch into week two of our presentations um so with that i'll bring the recording to an end and any attendees who wish to hang back and um have a chat with elissa and albert have any questions or comments please feel free to do so just raise your hand and i'll be happy to unmute you so have a have a good weekend be safe everyone and be healthy take care bye