 time to get started now. My name is Diane Goldenberg-Hart. I'm with the Coalition for Network Information and I want to welcome you here today to this webinar which is part of CNI's Spring 2020 virtual membership meeting. I'm so glad you could join us. Today we're going to be having a presentation from members of the staff at the Claremont Colleges who are going to be speaking with us today about collections as data, digitizing and making discoverable California water documents. In particular they're going to be speaking on the webinar topic of flows of water, flows of work, strategies, strategizing workflows for data discovery in the digitized Southern California water documents. Our speakers today will be Jessica Davila Green, Janine Thin, Yatesi Iletsko and Mark Buchholz. Before I hand it over to our speakers I just want to let our attendees know about a couple of features of the webinar environment. If you look at the bottom of your screen you should see a little Q&A button. If you click on that button a box should pop open and you can type your questions or comments in that Q&A box at any time and after the presentation is over I will come back on and moderate those questions so that our presenters can respond live. I also want to draw your attention to the chat box, again another button at the bottom of your screen. We'll be using that chat box to share URLs and some other information with you there. You can also feel free to use that chat box at any time. So without further ado again I want to welcome all of our participants and I want to welcome and thank our panelists for being here today and I believe Jessica is going to get us started so over to you Jessica. Thank you Diane. Hi everyone thank you for joining or attending our presentation. I'm Jessica Davila Green director of digital strategies and scholarship at the Claremont colleges library and I have a few colleagues with me. Each of them will introduce themselves as they start their portion of the presentation. Today we're going to introduce our division roles and the work related to the digital library, its workflows and an exciting grant funded project we're working on as well as how we envision this project influencing the future of the digital library and workflows as well as usability. The digital strategies and scholarship division was officially launched in fall 2018 an iteration of tech divisions of the past it pulled together the functional digital work that was previously distributed across other library divisions bringing together the Claremont colleges digital library the institutional repository scholarship at Claremont our data and digital scholarship services as well as our systems and technology experts all under one division. The new division seeks to leverage the backstage technical expertise and IT infrastructure along with the front stage front stage faculty and student relationships and collaborations of our data and digital scholarship folks. This includes identifying creative and collaborative ways to improve our digital library and further integrate our digital collections into the research teaching and learning of our seven college community. The Claremont colleges digital library was launched in 2006 with a primary objective of increasing discoverability and access to our special to our most requested special collections while reducing the handling of the analog materials that were served up in the reading room at the most basic level the goal of creating digital surrogates was to produce a one-to-one match digital to analog for viewing purposes. The digital library has lived on OCLC's content DM since its inception while the work of the digital library has moved across the library through various divisions the last three years however it was located in special collections and libraries under our fabulous western americana librarian Lisa Crane who's also one of the founders of the digital library. 19 years later with increased growth and interest in GIS a five-year melon grant that sought to establish digital humanities across our five undergraduate colleges as well as advances in digital library platforms we've reached the perfect storm of opportunity to envision and experience with our next generation digital library. Now that I've given a little background I'm going to hand it over to Janine and she's going to talk about our collections as data project and the opportunities it brings with it for the future of the digital library. Hi thank you Jessica my name is Janine Finn I am the coordinator for digital scholarship and data science at the Claremont colleges. I'm going to talk a little bit about this particular project and sort of how that is the sort of seed that's going to inform some of the work that we do within our digital production and digital library sides of things. Some of you probably a few of you at least are familiar with the collections as data initiative it's sort of a bunch of theories and also a specific project. The collections as data project was first funded as an investigation by IMLS I believe in 2015 and then in 2017 the PIs received funds from the Mellon Foundation to fund two separate cohorts working in two different phases of these of the project and the focus of these cohorts and these smaller sub grants were to look at cultural heritage collections that had already been digitized by libraries and archives and often had really particular and kind of amazing local significance but many of these collections were still locked up in digital formats big big big image files that weren't very conducive to any kind of computational approach or searching across platforms. The goal of the collections as data grants is to both improve the computational accessibility of these already digitized collections and also surface the history of marginalized communities that may be buried in some of these older archival arrangements. Next slide. This project has three primary deliverables this is across all the participants in the cohort. In addition to the collection as data which is the data set which will be the the files that we're actually turning into computational accessible data. We are also expected to create and share an implementation plan and a use plan. These are documents of how we did what we did and who our stakeholders are how we engage with them how the collections get used. These this part of why we really like this grant and these documents force us to sort of be more reflective and thoughtful as we go and think about what we're building and not just you know build it and they will come but sort of build it engaging our communities as we go so we're actually meeting our scholars and our community members where they need to be. Next slide. Thank you. Our our collection at Claremont Colleges the one that we're focusing on is the California Water Documents Collection. This collection ours at Claremont and then also with six partner institutions across Southern California where the recipients of a 2017 hidden collections grant from Clear which I think the deadline for that for the next batch is today so if you're thinking about it jump on it. There were over 40,000 items in this collection and there are about 14,000 that got digitized as part of this grant. They are materials that relate to late 19th and early 20th century water development in Southern California and these include government documents, field journals, maps, charts, photographs, hand drawn little tables and diagrams. They're all documents mostly from white settlers moving into California and moving water around to build farms, to develop land for housing, all that kinds of things. Buried in all these images and scans are some really rich sources of data that can help us understand the environmental history, the geology, the climate, the biology of Southern California. And we're really relying on our faculty partner Char Miller who is a professor of environmental analysis at Pomona College. He has been using the physical materials and special collections for many years with his classes and he has our sort of faculty informant helping us dig out some of the really rich pieces of this collection. And what's missing and what is the focus of this grant is what's really hard to find in the traditional archival arrangements are the stories of the indigenous people. You go and do a search and you don't find what should be here. So we are working with native scholars and community members the Tongva and Southern California along with some of the the more eastern tribes in the desert. We're working with some of them to sort of figure out what they need to know what they what their scholarship has already been in this area and what kinds of things that can help them surface these hidden histories. Next slide. This is a new new grant we just got this grant in January so we're still sort of figuring the first stages out. We're doing some assessment of these collections and thinking about tools and project timelines. We'd also planned on doing a good bit of outreach to our local academic community and our broader community of course under a shelter in place order that's really tricky all the sort of in-person meetings and focus groups we plan to do we haven't been able to do. I've had a couple meetings with tribal community folks and had a few zooms and a few emails. We're working on it we're figuring it out as we go and I'm happy to talk more about that. But yeah so we're working on that sort of planning an assessment phase and also the outreach phase adapted to the COVID pandemic and I'm going to hand off and to my colleague Mark Buckholtz who's going to talk a little bit more specifically about the current workflow and some of the really awesome nitty gritty. Thanks Mark. Hi as others have mentioned before me my name is Mark Buckholtz and I'm the digital production assistant at the Claremarch Colleges Library and I just now realized there's a bit of a pun in the current workflow as it pertains to water documents so I'll go with that and say that was intentional. Next slide please. So our workflows as it stands I'm going to just talk a brief bit about the digitization workflow that was done as part of the clear grant that Janine mentioned. Go over what we do in terms of description and then look at specifically what sorts of things we do for data sets in particularly. More specifically the dataset tagging and the OCR process which we do. Next slide please. So for digitization I would be remiss if I didn't mention that our digital production unit relies heavily on a team of dedicated student assistants. We usually have anywhere between 9 and 12 student assistants in a in a non-COVID semester so they do a tremendous amount of work and we really appreciate them. For capture we used primarily flatbed we've used primarily captured flatbed capture scanners with copy stands for oversized materials. For those of you familiar with the FADGE guidelines we try to follow those as much as possible and that these were the standards that were that all of this material was digitized under for the clear grant. Editing was minimized to maintain authenticity primarily rotation and cropping documentation of any capture artifacts and then making sure the files are named and exported to their proper archival master format. For textual materials this tends to be PDF A1B and for image materials we go with TIFF. OCR processing for those textual materials is done through Adobe Acrobat and then there's a quality control step there to make sure that you know no undocumented image artifacts make it through to the final file. We verify the file characteristics such as color space resolution and all of that and then we do what I call a singularity check for multi-page documents meaning simply that each page is digitized exactly once with no duplicate or missed pages so that we have an authentic representation of the physical original. Next slide please. In the description is really where I think that this collections as data project is really going to yield results. Our metadata creation we work with the special collections library to establish data dictionaries ahead of time for for the project making sure that the metadata is entered consistently and then we are bringing the images into OCLC's project client where we can leverage that automation with the metadata template to populate common fields such as collection, title, file format, these sorts of things and then we're also leveraging the control vocabularies to make sure that that appropriate fields are entered consistently according to ASCII subject headings or the name authority files and then once the students finish entering the metadata they get uploaded to the content DM server for approval and then for the data sets we are looking at tagging those objects in which we find the data set to to be able to reveal them later. Next slide. Quality control happens on the content DM server so usually someone who didn't enter the original metadata record reviews those records and this is either usually a full-time digital production staff like myself or YAC or one of our senior level student assistants and then once everything checks out we approve those records and they get published to the digital library. Next slide please. For data sets specifically we are working on some low hanging fruit. Primarily we had to sort of looking deep more intensely at the collection we needed to establish what constitutes a data set because we saw that by exporting a list of all records that were tagged as containing a data set we needed to create more granular descriptions of what that means. If you look at the spreadsheet here on the right hand side you have the data set type size and notes we're actually adding more granular descriptions of those data sets not just to say that it is a data set but what kind a chart map and how much of it is a data set. So what we've learned from this is that the amount of data can vary greatly from record to record so you have some where you have a 50 page document where only a third of a page is a chart and then vice versa you have an eight page document that is essentially a booklet of graphs and charts so the so-called density of the data set per record is something we're finding out has a great degree of variability and based on the legacy records there seems to have been sort of an inconsistent use of the data set tag we're finding a lot of records that have data sets that weren't tagged as such and vice versa where there's data sets that were tagged which may not be necessarily data might be in like a sketch or a drawing or something like that. Next slide please. So what are our next steps? Obviously we want to improve the data set tagging to expose all of this rich information to our researchers and our users so being able to bring that more granular level of data of description into Conda DM I think would be one of the deliverables for this project and then in addition we want to do some OCR enhancement Abbey Fine Reader is something we're looking at acquiring in order to capture more of the structure of each document and being able to open that up to more automated processes of programmatic access to that data and I think with that I will hand it over to Yasi. Thanks Mark. My name is Yasi Letchko I'm the Digital Technologies Coordinator at the Claremont Colleges Library and I'm just going to talk briefly about the feature directions for the digital library and I'll just touch on four components of that our platform workflows metadata and collaboration for our platforms we will soon be upgrading to the content DM responsive site which will improve user interface and will support some new functionalities such as triple IF integration and we're hoping that that will make our collections a bit more dynamic but we will also be conducting a platform review soon for a potential migration and we are hoping that the collections as data project will help to inform the needs and limitations of our current platform. We'll be looking at the functionality that we are currently lacking that could help us make data rich materials more visible and usable while also responding to our user needs the bulk of our collection consists of image files and PDFs although we do have some AV material as well but we will want to consider if there are other formats that we may want to ingest we'll look we'll need to look at how we will be connecting content to other platforms and in the case of datasets that might be dryad so that would be a conversation that we will have with Janine and if we are linking to other data and data sets having a platform that allows for that integration without it being a barrier in our workflows would be ideal although we haven't explored this type of functionality and content DM yet. Also our reporting tools have been a bit of a challenge so better reporting tools will definitely be on our list. Moving on to workflows we will definitely need to rethink our workflows we expect that inevitably through this project we will need to adjust what that looks like but aside from that we are working towards implementing digital preservation tools and preservation storage all of which will need to be taken into consideration along with what emerges from this project. So we will definitely need to adapt and potentially adopt different workflows. Our conversations with content appraisers will be really important to ensure that data-rich materials don't get lost in the shuffle so to speak and being able to flag items before they are added to our digital collections would be great but we'll also need to train our students to spot these types of materials as they're adding them to the digital library. So the work that Kenneth our current student assistant is doing can really help to inform what that process could look like. We would be interested in automating workflows programmatically where we could but that of course would require that we some further development of staff skills. Metadata of course is very important part of the digital library. This project will help us to answer the question of how we can improve our metadata to find objects with data that can be used computationally and another important aspect is inclusive metadata. So for example in these water document collections we have several maps with racist names in them and we've had discussions on how we should approach this but we really need to have established best practices for when we find items with offensive language and the steps that we need to take to address that. We also want to be mindful of the need to create access points by using terms that are culturally appropriate to describe objects. So to address these types of issues we're in the process of forming an inclusive metadata task force in the library with the goal of creating best practices for inclusive and equitable metadata and lastly collaboration is very important to us both internal and external for internal collaboration continuing to partner up with Jeanine and her work with data and digital scholarship potential collaborations with digital humanities folks working with our librarians across the library and continuous outreach and collaboration with the seven colleges which isn't always easy and as Jeanine mentioned especially now that we are not on campus. For external collaboration we would really like to learn from our peers and outside institutions on what they've been able to accomplish and how they are addressing these issues. So speaking with other libraries librarians archivists etc will be really important. So that's just a really quick overview of the types of things that we are thinking about with the digital library and the conversations we are having and that we will still need to have as we plan for the future of our digital collections. So with that I'll hand it back to Jeanine who will talk about the next steps in the Collections as Data project. Thanks Yacy. Yeah as we keep sort of talking about and I'm sure we've heard throughout these presentations and everything sort of adjusting our workflow for this work environment has been a constant way to keep us on our toes like some of the the metadata work that Mark explained sort of looking more granularly at these records figuring out what's there. That had in our original thinking been a little bit further down the path but suddenly we're all you know working from home and trying to keep everybody busy and trying to keep the project moving forward. So that piece became a little bit more foregrounded and so we're moving a little bit further on that. But the big picture we're like everyone looking forward to a time when we can be back in the library and on campus engaging with our communities, checking back in with our community partners. That's the part that we've had to had to sideline a little bit. One of the things in our immediate future is Pomona College's Humanities Studio event next year. At Pomona College one of the colleges of Claremont has an annual theme and next year's theme is indigeneity in their humanity studio and our faculty partner Char is one of the faculty fellows in that group. So we're hoping to sort of leverage some of Pomona's events and networks to sort of do some more outreach for our for our project next year. We'll also be working back with some of our Native community members online in the meantime and also ahead of that. They've got a few maps the Tongva folks who've been working with have an ArcGIS map that a grad student created for them and then the student left and it's sort of been left dangling like I don't know if you've worked with many GIS projects you know that can happen it ends up on a platform and nobody knows who's got what pieces of it so we're going to try and help them with some of that stuff. So beyond this collections is data project which goes until July of next year the bigger picture we're hopeful that some of the lessons from this project as Mark and Yasey and Jessica have all said will sort of help us develop even more computationally accessible collections and more representative and authentic and inclusive collections from our digital heritage collections at the Claremont Colleges. We have a long history at Claremont we have a lot of history materials related to natural sciences. Pomona's been around since 1887 doing geology and botany and recording data about the natural world in Southern California and a lot of is locked up in notebooks and analog formats in cabinets and this data could really help inform more research about climate science about environmental science environmental justice we just need to sort of get it out of those notebooks. Next slide so I'm just going to wind up and I think I can share this link with our open science framework repository this is where we are putting our zotero bibliography our working documents we will have some scripts and some data dictionaries in there as we go. Anything else that we come up with will wind up there and I think that's it and thank you so much and we're happy to take any questions. Thank you thanks Janine and thank you to the rest of our panel it was really an interesting talk an interesting project that you have underway there and an interesting constellation of moving parts and we appreciate you coming here to share them with us. Thanks also for that OSF link there Janine I think we'll probably add that to the project briefing page where as I think we chatted out earlier to everyone we will include a link to these slides at that project briefing page as well as we'll embed the video of this presentation there as well and then we also shared the github link which is already linked from that project briefing page. So as Janine said we are now opening the floor for questions so please feel free to share your questions with us by typing them into the Q&A box we'll read them aloud here or type them into the chat box we will share them as well and I see that we have a question now from Sarah who asks are there plans to ask tribal members to be part of the inclusive metadata group so you had mentioned inclusion of that group in parts of your project how about that the metadata in particular. Yeah that's a really good question we're at the preliminary stages of kind of forming the group but yes definitely we will be doing outreach to community members and I think that's yeah that's a really good idea to ask tribal members to be a part of it not ongoing but maybe as we you know uh is invited to participate in that process. Great that sounds wonderful thank you Sarah for that question. How do you how do you make those connections do you have people on campus that help you sort of liaise on with the community do you have trusted partners already how does that happen? Um so far it's a lot of it's a lot of connecting the dots right we do have someone on campus a scholar Julia Bogany she's a Tongva elder she teaches about water history and Tongva history generally she is affiliated with Pomona College and I believe Pitzer College teaches regularly so she's on campus a couple times a week and she is she hosts these water talks with several different tribal historians across Southern California where they tell the stories of where springs were where people lived how they used water so she is a sort of aware of of who's sort of in these conversations so she's been just super helpful for us to connect with we also have some connections through Cal State San Bernardino they were one of our project partners on the Clear Hidden Collections Grant and they host a water resources collection and their archivist their Susie Earp is connected with the Kauia people in Palm Desert east east of LA and she's got some connections there with some of their tribal anthropologists and some of their other historians so that's how we're sort of reaching out to those those Coachella Valley tribes through there but it's hard right now because everyone's just struggling with getting through what we're getting through so yeah we're trying to sort of make ourselves available for conversations but not like ask for anybody's time right now because yeah it's a difficult time it sounds like even under the best of circumstances it takes a lot of legwork to make those inroads with those communities yeah um I was also wondering about the funding stream given the um the crisis right now and the fact that I'm sure that a lot of your work well obviously your workflow has been disrupted um and you're on a grant are is that an issue that you've had to contend with um the timetable and funding uh our funding is fine uh we've been given a little bit of a an extension the original time frame for the grant was to start January 2020 go through April 2021 they pushed it back to July 2021 and we did an update the the main PI is Thomas Padilla who probably loved you know from UNLV and they're sort of listening to all of us to see like do you want more time do you need more money so I think there's going to be a little bit of flexibility with this particular grant because they've everyone's been you know turned upside down so I'm not sure about that bigger picture with the library and other projects I don't know okay yeah because I also was thinking about the um some of the next steps that you want to implement with the um some more tagging and um OCR that's probably going to be a lot of student more student time I would imagine is that what you're thinking of so imagine that'll be impacted as well well it's really interesting um I'm so fascinated by this project uh and we'll be really really interested in watching how it moves along um thanks so much for coming to CNI and sharing it with us we appreciate it if you're working on something similar if you're curious about anything make time or if you want to share some of your insights from your own experience with that thank you again everyone um take care be well and we hope to see you back at CNI school bye bye