 Hi, everyone. Thanks for watching our presentation. I'm Abby Shelton and I work at the Snite Museum of Art at the University of Notre Dame. And I'm presenting with my colleague, John Hartzler, who's a lead software engineer at Hesburgh Libraries, also at the University of Notre Dame. So in this presentation, we're going to talk about our modular approach for building a collaborative digital collections ecosystem for our campus art museum and the library. The library includes a rare books department and university archives. So I'm going to talk very briefly, since we only have about 15 minutes, about some of our work to connect with and engage with end users. And then I'll turn it over to John, who will focus on the connections we've made between our existing source systems to support the back end of this collaborative website. So to set a bit of context, in 2017, the University of Notre Dame received a grant from the Andrew W. Mellon Foundation to create a unified digital space for archival library and museum collections so that users could access these things all together, like you see on the screen. This was really filling a need for these campus units. The art museum had no online catalog for its collections, while the university archives and library had no centralized access point for their digitized items. Some of these library objects were spread across boutique digital exhibit sites, and a few collections were housed in our institutional repository. So the way we approached this challenge was not to migrate everything into a brand new system, but to instead build and strengthen connections across our human and tech ecosystem to make these digitized collections accessible and useful to our community. So our goals for this project were pretty broad, as you can see on the screen. Basically, we want to get more digitized content out into the world in a way that's useful to people. We want to work better together as a museum and art museum and campus library, and to experiment with some new to us technologies like IIIF and AWS. But because the grant didn't specify any really defined concrete technical deliverables, to further define our goals, we went straight to our community, and we define that primarily as our faculty, students, and staff. You might hear us talk about Marble, that's the name we've given this project, and the site, it stands for Museums, Archives, Rare Books, and Libraries Exploration. So to begin with, I sat down with about 100 faculty, staff, and students, pre-COVID obviously, to ask them about what they wanted from our digital collections site. And over the course of the project, we also talked to hundreds more about different iterations of the actual website when we started building. So for this project, we really wanted to meet people where they were, the real needs that they had, instead of trying to engineer something based on the latest technology out there. So I'm going to give you a quick, high-level sense for what our users really care about. Our users want high-quality images for use in their classroom, or to download for PowerPoint presentations or lecture notes. They want to be able to pull texts like books and manuscript letters into their preferred readers, PDF readers, like Adobe Reader, or to actually print them out for scholarship and use. They want to find images for event posters by searching for what I call first-order terms like blue, dog, family. Museum users in particular want to be able to see what's on view in the galleries, or to revisit objects that they've already seen in person, whether for personal interest or for a class assignment. And there are many more. These might seem like pretty foundational digital collections use cases, but they are our community's real needs that aren't currently being met for our campus collections. And that's sort of what we're starting starting from. But we anticipate that these needs will expand after we launch the platform, and also as the infrastructure and interest in humanities computing increases on campus. And so this is one of the reasons why we've taken this modular approach. We wanted to build in a way that would help us use and reuse our applications and also our data in ways that will serve future unanticipated needs. All right, so that's kind of the high-level human side of things. And now I'm going to turn it over to my colleague, John, who will talk about the back end. Hello. Once we received this grant and we sort of started sitting down as a team and sort of talking about the different types of problems, we realized that we were going to have to address a number of challenges and problems in our existing infrastructure systems. First off, we realized that we actually have a lot of data and a lot of data out there in the world and a lot of different systems. And that data is in use. So these are things like all of our catalog system or embark the museum system, but it's also spreadsheets that people have put together to solve specific workflows or maybe file maker pro databases. And we don't really want to, if we can help it, force people to change their existing workflows to accommodate an entirely new tool and the data needs of that new tool. In addition, Notre Dame doesn't have a dam. There's no place where our image metadata is consistently mapped. And even if we did have a dam, it seems unlikely that everybody who was, who all of the images that we wish to work with would be would be put inside of that system since we have such a plurality of systems already. Finally, like we have all this data and it's all being used for a lot of different purposes and it's incredibly difficult to guess what purposes will come up in the future. Next slide. So to try to handle this, the system that we came up with was to sort of take a four-tiered approach. On the left here you can see that we have existing systems in play. Now this might be embark, it might be a spreadsheet, it might be a Google doc, it might be a file maker pro database. And then we are harvesting it. So on the top we are harvesting two systems that are similar together. And then after that harvest, it goes through a bit of a normalization process and then that data gets stored in AWS app sync. And then there are site building routines that run on top of that. So site building routines harvest this existing data and use it to build outcomes. Those outcomes which are on the far right might be our triple IF infrastructure. It could be the merged marble website that you can go to right now. It might be pushing things into a preservation system or some other small collection website or whatever purpose might come up in the future. So just to take that big picture and talk about one very specific problem that we had inside of our actual infrastructure and that is we have our metadata and image problem. So we have several different metadata source systems. None of those metadata source systems. Well I guess that's not true. Embark references their images but several of our systems do not references the images that are associated with the particular metadata records. So we have to figure out a way to get that data and get those two pieces of data to come together. In addition the different systems are cataloging different pieces of information and they're cataloging it in different ways. So we have to find a way to normalize that process and sometimes data sets are incomplete. So here's an example of this architecture up from the previous slide was sort of boiled down to what we were attempting to do with our metadata normalization process. So on the left here you can see that we have processes for Embark, OLF and archive space where they are harvested and normalized. And then we have processes for the images. Our images are in S3 or Google Drive. There's a harvest and normalization and enrichment. In the metadata case we're enriching things. We are pulling data by looking at the Getty terms inside of the coming out of Embark and doing expansions on them or Library of Congress terms and doing some expansion and we're also doing some location expansion inside of that enrichment. That data gets stored together inside of AppSync and then inside of AppSync we are connecting the images to the metadata. It doesn't happen until it reaches that third step. So step one is sort of pull data, do some transformations on it as is, get it into the next phase where things get sort of merged together and reconstituted into something that we can use to build off of and make things go into the next step. Another of the really, really big problems that we have is this problem of multiple endpoints for our data. We have a lot of different unique boutique collection sites that are already in existence and already have data for them. The problems we have when we analyze them is that we find that they actually don't really have a lot of overlap in their overall goals. So they have a lot of specific unique data needs and all of them already have datasets and workflows set for them. But we still have done created this large workflow for our metadata and we want to make sure that those two workflows come together into something that is sort of maintainable for the future. We don't want to, you know, make people re-enter metadata on their microsites. We also, so we want to reuse the datasets that we already have and we also don't want people to have to learn an entirely new tool just to manage a site that's been in existence for a long period of time. And it was really this particular problem that made us quite a bit wary of the sort of monolith approach. Monoliths sort of have this problem that they solve a specific problem at a specific time. But as problems change or as you look at a whole set, a suite of problems that you need to sort of solve over time, they can create a lot of technical debt down the road because they have a lot, usually a lot of code embedded in them or they're pretty large. It makes them hard to make specific changes to without breaking other things. And so this is, you know, it's this multiple endpoint piece where we wanted to back up and say, okay, well maybe what we need to do is split up our architecture into pieces of transformations that are pretty specific to the endpoints that they're trying to trying to reach and to also aggregate the same types of transformations in the same level. So here is that architecture again, and we've added in a few different types of data sets. So on the left, maybe we have an existing relational database that is being used for a particular microsite, or maybe there's a FileMaker Pro database that is building another type of site. So maybe some people have data in spreadsheets that are used for very specific purposes, but they have existing workloads that are working for them around those things. AppSync is a type of tool that we can harvest from these systems, and we can connect to it. Some of them we can connect directly, and it allows us to sort of query them sort of across the board. And then we have set up processes, individual site building processes for each of those outcomes. So for example, we are currently working at migrating one of our sites, which is for the Inquisitions Collection. It has an existing set of essays that describe data about that website. They are already in a data format. When our process is done, we will have connected the new Inquisitions website to the metadata that exists revolving around the collection itself, and to another data source that has the existing essays in it. And we will reconstitute the site, reusing the data that we already have at the site building level. AAIF and our AAIF endpoint is another example of how this works. The same data that builds a marble website builds a AAIF endpoint also. It just does a different type of building process, one that builds them into manifest as opposed to one that is building them into sites. That's all nice, but we actually have another pretty difficult problem that sits on top of all that, and that is sometimes we actually don't have the existing data that we need. Like I said, Notre Dame doesn't have a dam. We can't always connect the images to the metadata that it's about. The systems that we have for managing copyright are sort of insufficient to the end needs of the current website goals. Different sites have different pieces of data that they want to display in different ways. How do we account for that? AppSync is really the key here, too. We actually created another process on top of AppSync, which is a tool that we're calling Redbox. What Redbox basically does is it allows us to pull in specific data from our dataset, make changes to it, and write it back. It doesn't actually overwrite the data that was already in existence. It just augments and lays on top of it so that when the site builder comes around and looks for data that's been augmented by Redbox, it just pulls that data augmented straight through, even though we haven't lost the existing data in the existing systems. A lot of this stuff has been set up because we don't really know what our long-term future challenges are, but we can guess at a couple things. We have to continue to work on our metadata transformations, and we need to continue to work in a continuous improvement environment. The continuous improvement environment allows us to adapt our processes to the specific challenges in front of us, but also by breaking up the infrastructure into these four different channels of data maintenance. We don't have to change the whole thing to make changes to part of it. Thanks, John. That was pretty quick, pretty high level, but we wanted to offer you our contact information. Our email addresses are on the slide there, as is our microsite, so please feel free to get in touch with us if you have other questions about what we're doing or want to see the site itself. Thank you all for watching. I hope you stay safe out there.