 get started. Welcome everyone and thank you for joining us. I'm Cliff Lynch, I'm the Director of the Coalition for Networked Information, and I'll be introducing the session. This is one of the project briefing sessions from week three of the CNI fall 2020 virtual member meeting. Week three, which concludes on Monday with a sort of a synthesizing summary that I will be doing focuses on standards infrastructure and technology. And I just want to note that there are a number of pre-recorded videos as well as videos of the synchronous session that are available and please avail yourself of those as it's helpful. As to this session, it's being recorded and the recording will be available to the public subsequently. Close captioning is available if you'd like to use please do. There is a chat and please feel free to use that as we go along. There's a Q&A tool at the bottom of your screen that can be used to pose questions at any point during the presentation. We'll address all the questions or as many of them as possible after we hear from our two speakers. Diane Goldenberg Hart from CNI will moderate the Q&A at the end. I'd like to introduce our two speakers. We have Esme Kells from Princeton and Mark Matanzo from Stanford University. Many of you will be familiar with the IIIF standard and some of the tools that implemented. But we're going to take I think a slightly different perspective here. We're going to hear from two institutions who are using this as a sort of a strategic part of their operational infrastructure. So we'll be hearing much more, I think, from the institutional perspective than from the perspective of here's a standard or here's a tool. And I hope that people will find that to be a useful and informative perspective. And if I've mischaracterized this, I will certainly learn the air of my ways in the next half hour. I would just note that there is also a recorded update on some of the recent work with the IIIF standard, which might serve as a helpful complement to this. And with that, I just want to thank our two speakers and turn it over to Esme to start the presentation. Thanks, Cliff. So I'm Esme Kells from Princeton University Library, and I'm going to start us off by talking about Princeton's digital library ecosystem and how we use IIIF to integrate content across our applications. So the starting place for me in where IIIF is useful is the problem of monoliths. This is a long-standard problem in software development that basically says that the larger an application grows, the more functionality it has, the more complex it becomes, and the more interdependency can grow within it and the harder it becomes to maintain. And I'll just point out one particular area, especially for web applications, is that when you have different kinds of users with different kinds of needs, that can also be a key driver of complexity. So there's a lot of different approaches to solving that at various different levels in the stack, but one approach that I want to talk about in particular is the Hydra approach. The project has since been renamed Sanvera, but the original name was Hydra based on the metaphor of the Hydra having one body and many heads, where the body was a single digital repository, in this case a Fedora repository, and the heads were these rails applications that provided different kinds of use cases for different kinds of content or different kinds of user needs. And I found that to be a pretty good model, but there are definitely some challenges with it. One of those is that given the scope of functionality in the Fedora repository, there's not really a user interface for staff to build content or ingest items, and so that tends to be implemented in each of those Hydra heads. And so you sort of have multiple implementations of some of your ingest and workflow type functionality, or it gets pushed to other systems entirely. It also encourages silos a bit. When you have a maps application, you tend to need to know that you want to have a map and to go to the maps application in order to find a map. So I'd like to suggest a slight refinement of the model, a trunk and branch model, which organizes things around the different users and how they use your systems. I find that a lot of different aspects of applications flow from those users, things like authentication, what workflows are important to them, what kind of discovery criteria they want to use to identify items, the scale of traffic that the application needs to support. And so in this model, we have a base, which is pretty much like the old base that we have a repository at the bottom with different kinds of workflows or digitization processes that funnel content into a single repository. Up at the top, we have multiple applications, in this case, different branches meeting different kinds of user needs. And in the middle, we have a trunk. And for us, that trunk is AAAF. Now, I know most of you will know what AAAF is, but I'll just give a quick primer in case anyone doesn't. AAAF is the International Image Interoperability Framework. As the name suggests, it was originally developed for images. And it is a set of APIs for working with digital content. It's moved on from images to include other kinds of content, including audio and visual materials and others. And the two first APIs are the Image API, which is a low-level API for transforming images so that you can store one kind of master image. But clients only need to download the bits that they need. So if you're showing a thumbnail of an image, you only need to download a 200-pixel thumbnail, not the 100-megabyte master image that you'd like to store as part of your digital library. And then the presentation API is sort of a higher-level API that builds on that content by bundling together content with enough structure and metadata to drive a digital object viewer, something like the Mirador or universal viewers that provide a rich interactive experience. So in Princeton's digital library ecosystem, we have a lot of different staff who are building different kinds of digital objects, all in one repository. So that's our FIGI repository that's the San Vera repository. And then we publish that content using AAAF and embed that content in a variety of different applications meeting different user needs. And so we definitely see this as a tree where AAAF is the trunk that is sort of the main way of distributing content. So now I'd like to walk through sort of a case study of one object in our repository, which is a map of Yosemite. So because it's a physical map, it shows up in our library catalog with other kinds of print and digital resources because it's a part of a thematic collection. It shows up in our digital collection showcase for that collection. And because it's a map, it shows up in our maps portal along with other maps and geodata. So in our repository, our staff users are mostly searching by identifier, because things are tied either to our Voyager, ILS, or to our finding aids. And so everything has a metadata identifier that's the primary key. And you can see that we have a bunch of facets in our repository for things like which collection is something in, what kind of work is it, but also things like the workflow state. Has this been approved and published or is it still in need of work? Is something open and public or is it private? What kind of right statement does it have? Who deposited it? Things like that that our staff really care about. Once you're looking at the object, you can see the digital object viewer, so you know what the users are going to see. But you also have things like functionality for designating what area the map covers, or seeing the fixity status to see that the content's okay. There are widgets for things like moving things along in the workflow to publish them and, you know, throughout buttons and forms for editing and managing the content. In our library catalog, the items look a lot different. The facets are really driven around how do I access this material? What is the, you know, the publication year or subject matter area? People are searching by keywords and titles and things like that. So the search functionality is really driven by that. And throughout, there's a lot of access information about how do I get to this item? Since that's the main concern there. So when you look at the item, you see a link to the digital object, but you also see a lot of metadata to identify and just get some context. And you see access information like a button to request access to this item in our reading room. And on that same page, we embed that same digital object viewer, in this case, the universal viewer, so that users can get access to the digital content right there. As I mentioned, there's also a digital collections showcase. So we have a number of collections that are thematic or related to physical exhibits. And so curators have, you know, build categories as an explanatory text and context so that users can navigate and browse the material. And the map shows up there with other maps and other related items. And then the last place I wanted to call that up was our maps portal. There are some browse options for a couple of big categories of content. But the main interface is that map on the right where you can click and drag and pan zoom to select content. So you can zoom all the way in on Yosemite Valley. And this map shows up along with our other Yosemite maps. So for us, the key benefits of this approach are that we get a little bit of centralization. We get to build our digital objects in one place. We get to reuse and share that functionality for ingesting content, approval-based workflows, and have that sort of one repository. But we still get to reuse our digital objects in multiple places. If something's a map and it's a physical item and it's a digital collection, it should be able to show up in all of those different things. And triple IO lets us do that in a sustainable way, where we can have one digital object that can live and be accessible in multiple different places. And because we're using triple AF, it can be embedded and used in other people's applications too. And so with that, I'll go ahead and hand it over to Mark. Thanks, Esme. So my presentation is going to be very complimentary to Esme, but I'll be talking about things from my perspective. So I'm Mark Monionzo. I'm the assistant director for digital strategy and access at Stanford Libraries. And so a lot of what I'll be talking about is really this access and reuse piece. I won't be talking quite as much about the repository management piece, but I think a lot of what Esme talked about is similar for us. So for us, one of the really important things that I think you all should take away from my presentation is that triple AF is really a central way to integrate digital content across and beyond our institution. Next slide, please. So triple AF allows us a lot of flexibility. I think it's useful to think of this as a trunk or as a backbone for us in terms of what we can do to deliver image-based content as well as soon AB content at Stanford. There are three major ways that this benefits as this trunk or backbone. Again, these are complimentary Esme's points, but are really about from my perspective in terms of building out a building on our access environment. First, it allows us to support both evolving needs and technical concerns. And these are supporting user needs, specifically researcher needs, as well as organizational priorities, in addition to things like maintenance and technical longevity. The second is that it provides us this trunk or backbone to leverage content both across the organization, so not just Stanford libraries, but Stanford University, as well as outside of Stanford. And then third, and this will be more of my sum up point, but also a common thread throughout my presentation, is that it allows us to seemingly integrate that content in a number of different contexts. So next slide. So like a lot of institutions, one of the things that our repository delivery infrastructure has to do is to provide a generic image viewer as a means to easily get repository resources into downstream environment. So it's offered by our delivery environment. Unsurprisingly, and if you've seen past presentations about triple IF over time over the last, say, nine years or so at CNI, one of the things that we found is that also maintaining an image viewer is actually really challenging. Before triple IF or even in the early days of triple IF, many institutions still relied on institution specific or otherwise bespoke code to provide a rich image viewer. Over time, what we've really been able to leverage at Stanford is a separation of concerns. So by using triple IF, we have a sort of this common technical backbone relying on these specs developed by the triple IF community. But and it's allowed us to manage manage change and add features. But also it allows us to think about how we support different versions of the specifications over time. And there, therein it allows us a lot of flexibility. All the while, this is also continues to allow us to leverage and support triple IF. So this has really led to an evolution of our viewing environment. So if you click through to the next slide, this was our image viewer as of 2014, which we just referred to as the image viewer. It relied on triple IF, but it was very, very simple kind of limited functionality in terms of zooming and not great from a user experience perspective. If you click through the next slide, you'll see what we called our image X viewer, which was designed in 2015. Again, this was a Stanford specific implementation that was really intended to, you know, address some of the usability and accessibility concerns of our previous image viewer. If you click through again, in 2017, we rolled out the use of universal viewer, which is an open source triple IF viewer that is really kind of optimized for for things like document presentation or looking at sequences of images like this. And then if you click through the next slide in 2019, we rolled out a Mirador 3. So Stanford, along with Princeton and a number of other institutions, such as the Bavarian State Library and University of Leipzig, Yale University have contributed to Mirador 3. And so we really felt that Mirador 3 was the best of breed in terms of triple IF viewing experiences. So we have a simplified version of Mirador 3 that we use in our delivery environment, which has been tested for accessibility and really, you know, addresses the core needs and the functionality that we need to provide. Next slide. Okay. Esme also talks about a couple different models of software architecture and repository architecture and why they don't always work well. So in particular, you know, he touched on the fact that monolithic repositories can make it really difficult to address different kinds of user needs. So for us, we've also recognized that this is the case for image viewing experiences. So we've realized that we need to provide this generic viewer, which I showed you on the last slide, but we also need to support distinct kinds of behaviors depending on context. So example, comparison viewing. We're also looking at ways to leverage triple IF as a means to pull our image content into an interactive experience for hybrid digital exhibits, or those with both digital and physical components. So if you click at the next slide, this will be very similar to what Esme showed you before. This is a generic image viewer that's embedded in the context of our catalog. And if you click through the next slide, this is the same object in our maps portal. So again, there's some context specific and information that relates to the discovery of maps and other geo data in this context. If you click through again, we're also able to bring in a mirror image comparison view to certain platforms such as our online exhibits platform and other contexts as we need to. And then if you click through further, this is just an example of this sandborn map or another sandborn map in the context of a narrative exhibit. So again, we're able to leverage the same viewer across these different contexts. In addition, if you click through, you'll see an example of us using Mirador 2 for image viewing for the proper library on the web project, which we work with Christy College Cambridge on. And if you click through again, we're also able to leverage triple IF to easily generate derivatives in an automated fashion for these image files. So at no point do library staff actually need to resize images to get this nice layout for this exhibit. So next slide. In addition, we also have leveraged triple IF to be able to add new features. So the two things I want to talk about really quickly here are searching with items in full text and the emerging needs to support authentication for resources. These are both cases where there are newer triple IF specifications, the triple IF content search specification and the triple IF authentication API that support these use cases. So if you click through, this is an example of the universal viewer in a document viewer specific context. What we've been able to do here is build out an implementation of the triple IF content search API that allows us to index text that's produced using optical character recognition for this resource. We can then provide a basically a search within feature for this for this particular document in the viewer here and show highlighted results. If you click through to the next slide, this is an example of a resource that requires authentication. This is a material from a mechanical engineering course. So before you're logged in, you see a low resolution version and if you are affiliated with Stanford, you're able to log in and see the full document at a higher resolution. We're also able to actually present download of full page images of this by being able to only serve up a tiled regions of the images in this presentation. Next slide. So in addition, one of the things that we're also starting to see is that it's actually really important to have a diversity of implementations for the work that we do. So unsurprisingly, we've invested a lot in Stanford Digital Repository as a campus-wide service, but we're also starting to see additional implementations on campus. At the same time, we also know that over time, there is this wonderfully massive pool of triple IF content. I think the latest estimation I saw was over 400 or 500 million images available over triple IF. So if you were to click through, this is an example of a collection not held by Stanford Libraries, but held by one of our campus partners, the Cantor Art Center. So we're able to, these Andy Warhol contact sheets are actually have been digitized and are stored in Stanford Digital Repository, and they can be contextualized in the context of this exhibit. Again, using the triple IF authentication API, because these are art images, they're intellectual property concerns, and we're able to provide restrictions that meet the terms of Andy Warhol's foundation foundation's need for online access. If you click to the next slide, this is an example of another institution, another implementation on campus. So the Hoover Institution Library and Archives currently supports triple IF as well, and they've been doing some work on the repository infrastructure to better support this. But this allows us to have a broader pool of content that's not only provided through implementation supported by Stanford Libraries. And then if you click through one last time, this is an example of how we've leveraged external content. So for the Standboard Fire Insurance Maps online exhibit, as you can probably imagine, Standboard Maps are available in many of our collections. They've been digitized in many, many places. And especially because of the age of Standboard Maps, many of them are available in the public domain. Because of this, there's a massive pool of content of Standboard Maps available over triple IF. So this is an example of an online exhibit, where we're actually able to import an external triple IF resource. So this is a Sandboard Map held by Library of Congress that our team at our Maps library used to produce this exhibit. So it's been great for us to integrate these external resources as well. Next slide. So all in all, I think for us, it's really important to remember that triple IF as well as an additional specification called OMBED provide these nice scenes for us to integrate content across our different platforms. This allows us a great deal of technical consistency as well as visual consistency. In addition, I think this really allows us a greater return on our investment for digitization and development of our platform and infrastructure. This gives us a lot of versatility. It gives us a pool of content to work with and we can do a lot of different things that in more imaginative ways with this common infrastructure. And so the next slide. Thank you. And I will hand it over to Diane to see if there are any questions for you all. Thanks so much, Mark. And thank you, Esme, and just a special congratulations to Esme for driving the slides while holding a cat. I think that takes a special talent and you pulled it off very adeptly. So thank you for managing both of those. The benefits of 2020. All right. But seriously, thank you for that really interesting presentation to both of you. It's fascinating to see how many different options you have by implementing this kind of a strategy. And the floor is now open for questions. So to all of our attendees, please type your questions into the Q&A box now for our speakers. And I'm sure they'll be happy to address them. I personally would be curious to know if anyone in the audience has experience doing something similar or is thinking about doing it. Please weigh in with your comments and questions. And while we're waiting to hear from our attendees, I was just curious to know from both of you, Esme or Mark, are there instances where this is not an advisable course of action? Are there things that you would not recommend employing a IIIF4 that it's not particularly suitable for? I mean, I will say the biggest thing that comes to mind for me is that it's a lot of different pieces to put in place. And so if you're just doing one project, it's overkill. And so I would say a monolithic approach for a small application is probably a better approach if you just have one repository and one front end. IIIF4 inside of a single application, especially the image API part to do the pan and zoom, can be useful. But having actually different applications is not like really only comes in a plane when you have a couple of different front ends that you want to access the same content. That makes sense. Yeah, I would agree with that. I think the other thing is, while there is this IIIF4 authentication API, I think that one of the things that's important to remember is it's really intended to support authentication or integrate with authentication systems. So it's not going to tell you how to implement what the terms of those authentication systems are. So you need to have a deep understanding of what your risks are, whether that's privacy, whether that's copyright. So in our case, this is a case where we have a well-defined model for how we apply our access rates for repository materials. Got it. Okay, thank you very much. I appreciate that. And I see we do have some questions. The first is from Anne Helmerich, who asks, with respect to the OCR and full text search, have you hit any roadblocks with respect to the different qualities or generations of OCR? Yes. Our initial implementation was while it was intended to be part of our infrastructure, it was for a specific project where we had, it was a small pool of content and it's tremendously high quality OCR. So we do, we have been producing OCR using Abby Fine Reader, the server edition, versus Alpha II, then as Alpha III for a number of years as part of our digitization workflow. And the quality really varies. Mid-century materials, especially type script materials, the quality tends to be pretty abysmal. So this has been an interesting effort for us in terms of trying to assess how it's still not perfect. It's a service that we're still refining, but it is one that is in high demand by our bibliographers and curators. I think the other thing I'll note is that there are also been a number of transcription projects that we've used from the page in particular, and we're working to now integrate those human produced transcriptions into that same content search environment. Thanks, Mark. Did you have anything to add to that, Esme? No, I'll second what Mark said, though. I do feel like OCR quality varies dramatically, and it really is a challenge. Yeah. Okay. All right. Thanks for that question. And now from Emily Gore, who asks, have either of you leveraged this in archive space or an IR, an institutional repository, or do you plan to do so? And I see that Esme would like to respond to that one. Yeah, we haven't yet. We have a custom finding aids application at Princeton now that embeds a triple IF viewer. We're currently migrating our finding aids to archive space and expect to go live with that either late this year or early next year. And we're, as part of that, we're building a new front end that's based on the Arclight framework, which will continue to embed that viewer. We do plan to have our repository sort of add digital content links to archive space when we ingest items that are linked to the metadata identifiers. That work is still to be done, so I'm not sure when that will be ready, but that's definitely making that link so that you can navigate between the discovery platform, the repository, and the finding aids management system in all of those permutations, I'd say is something that we really want to get working to allow that sort of flexible navigation and reversal of those relationships. I would add just really quickly, at least for us, Stanford Digital Repository is, I hesitate to use this word, but it's sort of a generic repository. So we it draw, so our IR like features are all are also built into SDR. We do have a couple of kind of distinct deposit environments, for example, for electronic theses and dissertations, but specifically, while I think contemporary ETs, this would not really apply mostly just because most people are submitting PDFs, for example, but we do have some historic thesis scanning projects where when the materials are scanned in from paper, let's say, or scanned in from microfilm, when we have images, we can deliver those theses via, via, by, via triple IF for archive space. Again, we're in a similar boat, although earlier on that then Princeton, we're working to be working on an implementation of an archives discovery system based on our plate. I do know that some of our colleagues at University of Edinburgh have integrated triple IF presentation directly into the archive space public user interface, and Emily, feel free to reach out to me and I can put you in touch with the folks there that have been working on that. I think there's a couple other places that have done that too, in terms of actually just bringing in a triple IF, you are directly into the archive space public user interface. Great. Very interesting. Thank you. Appreciate, appreciate those questions and, and the answers from both Mark and Esme. Thanks so much. And I do notice now that we're slightly past time, so I want to be respectful of everyone's time. A big thanks to Esme and Mark for coming to CNI and sharing this approach with us and also to our attendees for making time out of your day and to the cat. Thank you so much. We'll, we're going to hang around a little bit and chat after I turn off the recording. So any attendees who are still with us who would like to have your mics turned on, just raise your hand. I'll be happy to do that. Join the conversation with us here now. And otherwise, I'll wish you all a good rest of your day and hope we'll see you back at CNI soon. Thanks so much. Bye-bye.