 Our intention today is to start a conversation with you around the idea about whether the current infrastructure model for a national digital library that we're using is the best that we can do. And specifically that model is distributed repositories with a central metadata index. So that's generally what we're talking about today. At the bottom of this slide is a URL for a Google Doc that we'll be using later to record feedback. So there are some prompt questions on that URL, on that document, and I'll bring it up again on the last slide, but you can feel free to fill in as you go. So in the current model, the Digital Public Library of America harvests only thumbnail images and metadata from member repositories. Display objects remain at those repositories, and when we talk about display objects, we're talking about the viewable image that's generally about 800 pixels long on the long side. So I just want to make sure there's that distinction. We're not talking about high resolution archival objects, okay? We're only talking about that middle point between the thumbnail image and the high resolution file. It's just a display object. So there are some challenges that we've seen with the current model. Chief among them is that Google and other search engines will not index websites that host only thumbnail images and metadata. They view them as link farms. They want their users, Google wants its users to get to the object immediately. It does not want the user to take one or two more hops after the user finds an item in a central index. So that lack of indexing by search engines hinders public access to content and hinders DPLA's ability to refer users to member repositories. But also the structure is also a burden on every repository. We are all maintaining and managing these repositories. So there's a big duplication of resources. We all need system administrators. We all need database administrators or some portion of them, website developers, designers and so forth. And each repository must address its own search engine optimization issues. Some of us do that well. Some of us don't do that very well. I'd say actually most of us don't do that very well. Most of us don't have the resources to assign an SEO, FTE or halftime FTE to monitor the continually changing conditions. So the question that we sought to answer with a little pilot project that we did earlier this year is, would Google index more effectively if DPLA hosted display objects? Or was there something inherent about the DPLA platform that was preventing Google from indexing? So I'll turn the rest over to Michael. Hello, everybody. This is actually my first time here, so thanks for having me. So Kenning brought this issue to our attention probably several times by raising his hand and shaking vigorously. Clearly we all want our content to be used and he sees us as a place where that can be accelerated. So thanks to Kenning for raising this issue. In terms of this experiment, it was something we were happy to collaborate on. And it took a couple of tries before we got something conclusive, but I think based on the methodology we used it is more conclusive than our failed experiments earlier. So next slide. We set up two domains with identical software on them. Hiking and Experiment Control.dpla was a very generic interface and HikingExperiment.dpla, also very generic interface, but with the enhancement of actually having that full-frame image that Kenning talked about. And we brought a very small number of images into this because they're actually Kenning's vacation photos because he put his skin in the game, which I think shows a lot of commitment. So and to be clear, the reason why I did that was initially we tried to use photos that Googlebot had already seen on other sites and we had a lot of trouble getting Googlebot's attention. So we brought both of these sites up, identical metadata, small thumbnail on one, display image on the other, and we submitted those site maps to Google. We didn't do anything else to provide for SEO. We didn't change the metadata. We didn't go advertise the site anywhere or link to it from anywhere. So this was meant to be a very controlled experiment as much as you can do a controlled experiment on SEO because you're living in this ecosystem of the web where things ebb and flow, and we over time monitored the crawling process on Google Search Console, which is not a media, it's even with a small number of images, Googlebot doesn't necessarily do all that in one day. So here you can see two screenshots of the particular item page, and you can see here that this is literally Kenning going on a hike, and I personally took these photos so we didn't have to embarrass poor Kenning who had been going on this close up face or anything like that. But you can see, obviously, as an end user, if you land on the thumbnail version of this site, you don't really quite get an impression of what you're looking at, and it's our understanding that Google is trying to bring people to the most canonical or most user satisfying version of the content and is doing things across domains to make sure that they can do that. So if this content is copied 15 times, one version is much better than the other, performs better, has better page structure, or whatever, Google will pick that one as the canonical. So we built these sites, both at exactly the same time, submitted them at the same time on the same day, and just sort of raced them. And our result was that the control version of the interface only had eight of the 100 knee objects indexed by Google, and it just gave up. Whereas the majority, the vast majority of the content was indexed on hiking experiment, which was the non-control or experimental version of this site. So I don't know if you want to broach these questions. Sure. It wasn't the resort vacation. So yeah, these are the prompts that we want to start. Again, that URL for the Google doc is down at the bottom. In fact, I can just, there we go, switch over to the Google doc. So feel free to fill in or feel free to step up to the microphone and have this conversation with us. I think basically what we're trying to get at is, is there a way for us to overcome what I see as our reluctance to move display objects to a central metadata repository like DPLA? Why do we continue to want to manage our own individual repositories? And those of you who have talked to me in the past know that I'm sort of talking about this with the institutional repository landscape, too. And that we're not having things in a central repository makes it difficult for us to tell what we have first collectively. It makes it difficult for us to gather analytics and gather use and performance data. So what is it that's keeping us? Do you think this is an okay idea? Do you think this is something that we might be able to pursue collectively? About 20,000 or so. But you don't want my photographs. Oh, please. Do you, okay. Do you want to step up to the microphone? Thank you. So just to be sure that I understand the problem that you've presented, I'm wondering how other aggregated digital libraries, and I'm Jennifer King. I'm head of Special Collections at Emory University Libraries. How have other aggregated digital libraries dealt with this issue? And I'm thinking specifically about Europe Piana. And that would just help me understand if there is a model that has incorporated the images that you're identifying. Yes, good question. Would somebody mind closing that door back there? It's a little hard to hear. Thank you. That's a really good question. And I think there are people here who know a lot more about Europe Piana than I do. But Lorcan, do you know the answer to this question? I mean, I think Europe Piana is distributed as well, isn't it? Yes, I'm seeing some nodding. And hosting the full frame image. But I don't think they have the conservative thumbnail policy that DPLA has, but they don't hold the preservation copy. I think that's my understanding. So a broader answer to your question. You know, there are examples, HathiTrust is a great example, where we have collectively built something. ResearchGate is a commercial example where it's a massive institutional repository that has far outweighs what we have in our distributed institutional repositories, academia.edu too. But among nonprofits like us, I don't know of other examples where they have centralized this. Yes, sir? Bruce Hedrick with JSTOR. One of the things we've been doing over the last two years is opening up the JSTOR platform to allow people to put stuff, their digitized special collections on the platform. And they've asked us a lot why we would take the metadata and the objects. And this is precisely the reason, because this stuff gets indexed in Google and gets found. And I think that being in the research workflow is really important from that perspective. So collectively, I think there's a way to get at what you're wanting to do. One of the things we have about getting close to 1,000 collections on the platform for about 200 institutions. And all those institutions want us in growing to be able to syndicate the content back out to DPLA so it can be put into a more national place. So I think collectively there's an interesting way to do this, whether you have to be a channel like JSTOR is trying to think of itself as or whether you're a repository itself. One of the things we've noticed though, there's a reluctance for libraries, academic libraries in particular, they do not want to share always the objects. They want to point back to the source of truth at the institution, which is understandable. So I think that's always, in some sense, always going to be a barrier for folks of wanting to push stuff out to a national sort of central repository. Because we're even struggling with that for some institutions in the little experiment that we're doing. Yeah, thank you so much for that. So I stand corrected. I was not aware that JSTOR was doing this and kudos to you for taking that model. So it's that reluctance that I'm trying to get at. Why is that? And we are maybe one of the last industries that has not fully moved over to this cloud-based kind of model. We don't have many Wikipedia's, right? We don't have many Facebook's. Orkin, please, rescue me. Well, so first of all, I'd argue that we actually do have lots of many Facebook's and many Wikipedia's. They're just more localized and focused in their nature. And so we're not aware of them unless we're part of them. I think the other thing I'm just thinking this through. So first of all, with the experiment, I'm concerned because you raised the issue that these needed to be new, unique photos to index properly for the experiment anyway. But part of what you're proposing is that the archival highest resolution image still already lives somewhere in some repository and therefore has presumably already been indexed. So I'm not sure if you achieve the outcome you're looking for through this intermediate step. I'm not saying you don't, but the experiment doesn't really get to that. And then the other piece is the preservation piece, right? There's an institutional commitment to preservation that we've made and by virtue of having our institutional repositories. And because it's local content, there's a local commitment. Nationalizing that commitment feels like losing control of the responsibility that the organization has taken upon itself. So I feel like that's part of the impetus or resistance rather. Yeah. Thank you. And to be clear, we are not advocating the step that we would be a much bigger step to try to aggregate all of the archival full resolution materials. That's another ball of wax. I think what would have to happen is locally those repositories would be dark repositories. We're only talking about the publicly available materials. And most repositories already keep the high resolution files in a dark place. You know, that's not publicly accessible. Lorcan times the OCLC, I think there is an interesting question about historically archives of this type of cultural heritage, special collections materials and why we have very specialized examples of them, but not more general aggregations. European, I think, is a special case because of the political support it's had and the reasons it was set up and one wonders, you know, longer term where a European will be. I think one of the factors is that when you think about these types of material from a research point of view, a researcher typically will want to prospect comprehensive resources or will prospect lots of individual repositories. Typically when you look at digital aggregations, they tend to lack that sort of topical or disciplinary or place or whatever coherence in terms of comprehensiveness about that. They quite often tend to be a somewhat random collection of resources which may be very good for sort of teaching and learning purposes, for undergraduate research purposes dealing with primary resources, but from a more general research perspective, don't have the confidence that one might require, that one is prospecting the full range of what's available and I think that's been a major issue. You didn't mention, I mean, if you compare the library digitization aggregation type of activity with the more targeted commercial endeavors of ProQuest or Gale who tend to go for deep and narrow rather than broad and shallow, I think there's a very important dynamic there. The JSTOR approach is interesting as well in the context of, you know, whether the, you know, JSTOR platform has that gravitational pull and puts these things into the workflow which it probably will. I think one interesting thing about it though is that it, and Bruce will correct me, it's going for collections rather than individual items because there is some coherence and interest in the collection. Whereas, again, many aggregations just go for individual items, losing the context of the collection which can be quite important. So I think the JSTOR focus on collections is probably sensible, but even so it still has the issue about the higher level comprehensiveness within particular subject areas or other things. So I think it's very interesting and fascinating. And I'm sure that the Arling, the Kenning Harlech photographic archive will be a major contribution to knowledge and should be preserved in many places. I knew I was going to get grief for putting up my own photos. Sarah Shrie is from the University of Arizona. So there's others in the room who probably know far more about this, but I wondered about the role of IIIF. And thinking about this, knowing that Europeana, I think, is doing some large scale experiments there. Is there an SEO problem with that? Okay, but it would be interesting to hear more about that, since there's a lot of investment in the community, of course, with that as a goal to be able to sort of have images move around, right? So, thanks. Yeah, I think based on my research, which is essentially just asking questions on the IIIF community list, the sense is that Googlebot does not render the IIIF viewer when it hits an item page. So for those folks that are using them on their item pages, they tend to have a representative service copy and then a button you can click to see it in the IIIF presentation. So it's certainly great UX and certainly very useful for a number of reasons, but it's not an SEO strategy so far. Things changed quite a bit. Anyone else? Any other thoughts? Hi, so good morning, everybody. Eric, visual UC San Diego. I've been thinking hard. This is an interesting question. I haven't really thought about it before. But I wanted to pick up on something Lorcan commented on around the kind of the value of the collection context, which I think is where you're going. And it's making me wonder about the DPLA data model and kind of the limits of aggregation. And I'm thinking of the challenge that libraries have had moving from encoded archival description, you know, finding aids to fully robust digital archives of collections. Could I invite you maybe to reflect on what opportunity or challenge you see with DPLA kind of with that collection framing? So are you specifically mentioning the DPLA data model as it being item-based or the DPLA schema, so to speak? I think I'm asking maybe a more philosophical question, which is a strategy to success here, something a little bit more collection focused that maintains that context. And Kenning's making me think about HOTI and whether or not HOTI, you know, DPLA reframed as a HOTI-oriented sort of endeavor would take us somewhere different. I mean, they build collections out of the content they have. But so there's no need to dive deeply into the data model. I'm wondering maybe if that would be a fundamental shift for how DPLA thought about how it aggregated and managed content. Yeah, I think so starting really at a very base technical level, trying to get Google to index 45 million or more pages is a tall order regardless. So if we started taking a collections-based view of the UI, at least to some degree, what we would be able to do, I think, is create a page that seemed like it was built around a theme, the theme being that collection, which might have better SEO than an individual item, particularly since a lot of description is done at the collection level. We do gain access to some of that depending on the provider, but things like OII tend to serialize the leaf record metadata and the collection metadata is over there. And so we're sort of missing that context in a lot of cases. I think that as it stands now, DPLA is not meant to be the canonical representation of the metadata of the object, but more like a union catalog that gives the end user a way to search and discover things and then go to that page. So if we did approach this from a collections level, I think there would be retooling involved. We would have a different metadata stream to gain access to, but it might end up making more sense to Google in the sense that there might be narrative description at the collection level that we can drop on a page full of items from that collection that makes that much more SEO friendly. Thank you. So I'm Kate Dewey. I'm from the University of Maryland Libraries. And I think a lot about this question in a very different way, perhaps. And I don't know that I'm all the way to a question or a thought here, but digital libraries are fundamentally social justice mission projects, right? And I think that there are plenty of reasons that many people in this country may not necessarily want to participate in a national digital library. And I'm thinking about whether or not you're considering some of the social justice implications of moving away from the distributed repository model where we can be more responsive to local communities, where we can engage very directly in partnerships and what that would mean if it switches to a national model. What was the last part of it, would switch to a what? A national model. Yeah, that's a big question. I mean, I do agree with you that digital libraries are a social justice issue. I mean, providing access to these materials to people who can't get to them otherwise is a fundamental mission of digital libraries. Here, you know, the national model that we're talking about is not a top-down model like what Michael suggested, your Piana is. You know, we are, what we're suggesting is that we do this collectively. So I don't know what the social justice objections would be. I'm looking at this purely from a traffic standpoint, you know, and that's what I'm trying to address. That's what I'm interested in addressing. We don't see a lot of referrals to our hub from DPLA, and that's what I'm trying to increase. And in that way, I think it would address social justice issues as well. If this stuff is invisible, if it's not in search engines, unfortunately. I should say that we have worked with groups who are, you know, underrepresented communities who have liked the fact that DPLA doesn't try to be the canonical representation of their content, and they maintain control, and we link to them. That's absolutely, that's absolutely already the case. And my personal understanding of this problem of space is that, you know, participation in DPLA is voluntary, and that there's probably a variety of different approaches to how groups might want to work with DPLA, and so there won't be one comprehensive way of doing this all. It might be sort of an a la carte menu approach. Roger. So thank you for stimulating such a terrific discussion. I'm Roger Schoenfeld from Ithaca SNR. I wanted to, you know, I see this as a sort of subset of a wider set of questions about discovery. Others have framed things somewhat in those ways as well. I just wanted to bring in a little bit of what I see happening in the publishing space, the commercial publishing space in particular with discovery, because I think it may shed a little bit of light here. The publishers for more than a decade had been insisting that all the traffic had to come to their websites as the place where discovery would link to, right? And so whether it was your index discovery service or Google Scholar or whatever, their mindset was you draw the traffic towards Science Direct or Wiley Online or whatever it may be. What's been developing in the last couple of years is a recognition that the traffic is, that's an unnatural way to try to move traffic around, right? And so they've taken the approach that the traffic is at places like Research Gate. The usage is that the scholars are at Research Gate. And so there's this change in mindset that's emerging around syndicating content away from the publisher platforms and towards sites like Research Gate. I'm sure there will be others as well in the future. For better, for worse, you can unpack it in a variety of different ways. But I think what's interesting about that is the question of what is the nature of why the content is trusted, right? Versus how do we know that it's preserved versus where is it used? Are starting to decouple from one another in that kind of model. And I think I'm seeing some elements of the discussion here that suggests that maybe there's an opportunity for us to think about special collections in some of those ways also. Like how do we know it's trusted? Who's curated it, right? Doesn't need to be necessarily the same thing as whose website do you get to it at? Doesn't necessarily, you know, so I think that's just, there's some really interesting threads that I point to a parallel there. And I wonder if that is, I wonder how that broader set of changes in discovery is informing your thinking here or if there is a resonance there that you see as well. Thank you, Roger. I mean, that's, you know, clearly I need to talk to you more. I mean, this is really a very thoughtful analogy. I guess that, you know, again, from my perspective, we're not trying to say that people need to go to DPLA, right? But we know where people go. People go to search engines. And that's what we're trying to address here. We think that if the objects are in DPLA, we can do a better job, as you saw from the experiment, of getting search engines to index the content there. And once it's there, it's going to drive a lot more traffic. And issues of branding and identity can all be addressed in a cloud environment, right? So, yeah, that's a very thoughtful analogy, though. I appreciate that. Thank you. Michael, do you want to say anything about that? Yeah, I think, you know, one model of the purpose of DPLA is that we are sort of this corpus callosum of drawing links between different representations of an item in its home repository. We're engaged with the Wikipedia community to bring a lot of some of our objects to Wikimedia Commons so they can be placed in Wikipedia articles. And maybe it could be that the purpose of the DPLA item page is as much a representation of the image as a place tracking where projects like that have brought things. There's, you know, Ombra Search is another place that houses objects that came from DPLA. We've been working with Digital Pacific, which is a project to do aggregation of Pacific Islander people records. And so they brought some of our records in as they've been held in the states but describe cultures in the Pacific. So I feel like, you know, there is a, I feel like our field's ability to work on the SEO problem is going to become more and more of a problem as what table stakes are for SEO become more and more onerous and that commercial entities are always going to have more resources to do that. So that is a problem I personally want to work on and help folks to do appropriate things whenever I can. I don't know what the right model is for us to work and I don't want to say to everyone, you know, the only way that we can build a better future is that you have to give me all your items that doesn't feel comfortable to me. But I am happy in whatever way to be part of that solution and I guess, you know, just quickly I'm wondering how many people know what the lighthouse scores of their repositories are, does anybody know? Do you know if you have someone at your institution that knows what the lighthouse scores are and is tracking them? Okay, good, someone. I mean, that's something that I know I don't have the bandwidth to maximize but I feel like as a field we need to do a better job of propagating the information that hey, there are metrics you need to be hitting and they change every year in order to be considered canonical or not canonical, useful to Google, useful to Bing and DuckDuckGo and whatever to make this stuff legible and that's a problem I think we all need to work on in some way. It was a really long answer, sorry. Rob Hilliker from Roan, I realized I didn't mention my name when I spoke before but what Roger said actually flipped the switch in my head which is what about flipping the model entirely? Because when I ran a repository I did look at SEO and try and optimize and the reality is, you know, unique content that lives on the web is something Google likes, right, especially if it's virtually described and so if I've done that in my institutional repository I'm getting the Google traffic but what's not happening and this is something I struggled with as a repository manager is referrals to similar content elsewhere that has been curated. What if the model was that there's a mechanism to get DPLA discovery into all of our repositories so that our advantages of SEO over what DPLA does drive traffic laterally to other collections of related content? Would that perhaps be a better approach? I mean, again, it's difficult because then the burden is on everybody to figure out how to integrate it but perhaps the burden is also the advantage because then you can integrate it in a way that makes sense for your organization. That's really interesting. Well, Doran, I was hoping we would have this all wrapped up in 30 minutes but I really want to thank you for a great discussion. I mean, this is the wonderful thing about bringing people together and having the ideas come out. You've taught me a lot today so thank you very much. The Google Doc, I don't know why it's not showing there, will continue to be up there so if you have thoughts and ideas you want to add to it, please do. Thank you very much for coming.