 I'm going to move over to our next presentation now, which is on preserving new forms of scholarship. This is a presentation by Jonathan Greenberg and Karen Hansen of New York University. And this is dealing with a report that a whole team of people put out in late 2021. And they're all, they're going to summarize some of that work and also talk about a very, very exciting second phase that's underway as we go forward. So over to you, Jonathan and Karen. Thank you, Cliff. And good afternoon. Good morning. Good evening, wherever you happen to be. I'm Jonathan Greenberg digital scholarly publishing specialist at NYU libraries. I'm joined today by Karen Hansen, who is the senior research developer at Portico to talk about two projects. And the combination of which we're going to refer to as preserving new forms of scholarship. So these two projects were first enhancing services to preserve new forms of scholarship, which started in 2018 and ended in 2020 and as Cliff said the report from that, that project as well as guidelines were published last year. And we'll talk more on that soon. And the second project is called embedding preservability for new forms of scholarship that projects got underway last year and will run through 2024. The projects are funded by the Andrew W. Mellon Foundation and NYU libraries is the primary grant holder here and we are providing project management and coordination for both of these projects. So there is a whole range of partners on both of the projects on enhancing services. You see here, a range of university presses to preservation organizations clocks, Portico, as well as us at NYU libraries. The genesis of our first project was a series of of projects that was funded by Mellon Foundation to further digital monograph publishing. These grants were started around 2014 and projects such as full Chrome and manifold, as well as many others that had built capacity for enhanced digital publishing. And these, while building capacity, they generated a lot of really wonderful digital scholarship, but they could not depend on this preservation services that they that they normally did to preserve all of this work. Portico and clocks and other organizations including library preservation units had difficulty preserving these complex and new forms of scholarship. However, there was still a kind of an expectation that all of this work should and would be performed preserved. These, these works fit still fit squarely within what what we might think of as the scholar record. They're cited as books would be their creators depend on them to contribute to discipline disciplinary and extra disciplinary discourses. They're used by colleges and universities to determine determine advancement tenure. And they became they become part of a future of a record that future students and scholars expect to be able to access and site. So there was a real need to improve this this ecosystem of preservation for these new forms of scholarship and I should say what we mean by new forms of scholarship is a little bit different from what the previous speakers were talking about. The, the, the, the work in the first project that we worked on was largely on sort of book like objects, and they were, they had various things from multimedia data embedded data embedded resources of various kinds platforms that allowed for non linear forms of narrative. And that's the kind of direction that that that we were investigating the nature of these new forms presented real challenges to to preservation specialists. Specifically things like interactive features remote resources, and, and these non linear narratives sometimes, you know, the works might be seen more as a, like a website and they would like a book, depending on how the user is meant to engage with the, with the publication. So the first project had two main goals first deter to determine whether the publications could be preserved in their current form, and whether it would be possible to do this at scale. This scale is very, very important for these for clocks and cortico and for others who who have processes for and contracts with publishers for for preserving their work. The goal was to use the these findings to develop guidelines that that would help authors and publishers to make similar publications easier to preserve. And as a side effect make those publications more sustainable so these this goal really was outside of the preservation organizations per se. To create guidelines that allow authors publishers platform developers to make work more preservable, no matter how that of how that work may be preserved in the end. So out of the project came a set of 68 guidelines that apply to a range of cases in digital publishing. They are designed for publishers for platform developers and for authors, although our primary focus in this case was really on the platform developers and and publishers. As they are in the works that we were looking at, they are often the ones most engaged in architecting planning and and following through with the work of creating a digital publication. There are of course many other scenarios where authors are driving that technical work themselves. None of these guidelines are prescriptive, but all aim to pursue to to improve and facilitate preservation. So our intention is not to make any publisher author feel like they have to subscribe to any of our guidelines. They're always competing pressures, whether that's for user experience, whether it's particular way that they need to present their, their scholarship, and sometimes the way that that digital publications need to be constructed really makes it very difficult to to preserve but but the the the needs of the of the scholarship might might simply, you know, create that that friction. And we don't want to to avoid that friction we don't want to make to to, we don't want to hamper scholars or publishers from doing really interesting and forward thinking work. We want them to understand what the implications of their decisions are, and when there are opportunities to make decisions for more preservable scholarship, we want them to know how to do that. So the, it's unlikely that all or even most of the guidelines will be relevant for any one project. The, the, we published these guidelines in two ways, one as a static PDF and EPUB that is available through NYU's institutional repository. And secondly, as an interactive website that allows users to sort the guidelines by tags, so that they can try to narrow down which ones might be more useful to them. And I'll read off with the tags work because I think that gives a kind of a good sense of the, the range of guidelines that we that we included, you have web based publications, EPUB rights publishing platforms, planning and party dependencies, embedded resources, export packages and software and and data. So, I briefly want to just talk a little bit about what we were doing, the kind of preservation that that this project is is really about and how it differs from other ways in which libraries in particular think about preservation. So, libraries and archives tend to think about preservation, preserving existing collections. The process starts from acquisition, what you bring in whether it's digital or, or, or physical a physical objects, you are tasked with preserving that objects. The work in libraries and preservation have don't generally have control or influence over their, their creation. Now, there have, of course, been collaboration between at the industry level between publishers and libraries. For instance, the adoption of asset free paper in the 1980s, which was brought on by the development of standards. By and large, the library's technique of preservation as something they do to existing collections. Now, when we think about what preservation means for creators and publishers. We're thinking about something a little bit different. And this is about perhaps applying standards and processes to the process of creation. And authors can, can change those standards and process processes in order to facilitate preservation, they can change how they work, how they, how they develop and publish scholarship. And they can make decisions about content and user experience to facilitate that preservation. So, this is my last slide before I'm going to hand it over to Karen, but I just wanted to talk a little bit about the ways in which various actors in the scholarly ecosystem might interact with each other to facilitate preservation of this kind of work. And for those of us who create complex digital scholarship. As I said, you know, we, we hope that they to help them to understand the risks to preservation that will help in producing their scholarship. They may decide that certain features are more important than the risks to preservation, and some other scholars may issue preservation entirely for various reasons. In many cases the decisions that go into developing digital publications come from publishers and platforms rather than perhaps in consultation with with authors. And here are guidelines allow platforms to make changes some small some larger that will aid in the preservation of many, or even all of the publications built on on that platform. Publishers can create local best practices based on their resources workflows and the nature of their digital publishing program. In some cases, publishers may need to work with librarians and preservation specialists to make those best practices know that many publishers just don't have the resources or the expertise available in house. And all of this I think points to the need for a more collaborative ecosystem for digital scholarly publishing. With that, I will stop sharing my screen and handed over to Karen. Oh, we can hear me and see my screen. Yes. Great. So yeah, I'm Karen Hansen. I'm the senior research developer for Portico. And I'm going to be talking about the impact of enhancing services project from the preservation perspective. And then after that, I'll talk about our new project. So if you're not familiar with Portico, we're not for profit community supported digital archive for preserving scholarly content. And when the service was launched almost 20 years ago now, our workflows were built around the concept of a publication basically being made up of linear texts and images. And usually for those kinds of publications, if preservation is just sort of thought at the end, then it'll usually work out. But obviously a lot's been changing, and there's a lot more options for scholars to share their research. And so we've evolved in some ways to deal with the change but we're seeing more and more complex forms of content come into the archive and sadly sometimes they have pieces missing by the time they reach us. And so enhancing services was an opportunity for us to really understand what the challenges are and see how we could improve our workflows for them. And so Jonathan talked about the guidelines. So what do we see as the impact of the guidelines. First of all, hopefully it means we can preserve more things and do it well. But that may be hard to measure initially. But what we've seen already is that the document is actually proving really useful for communication so that when we're approached about something complex or new we were using it as a reference and a tool for the discussion so we can refer to it throughout. And then third, I think we'd like this work in general to start a conversation. And Jonathan mentioned a collaborative environment. So we'd really like to see more collaboration between the platforms, publishers, authors and preservation services, as they're in the process of designing new forms of scholarship. And I think there's a lot of opportunities for us to work together and form some standards that will help make this content last much longer. And there's even benefits to doing that outside of preservation because nobody wants these to be deteriorating on their platform either and so we're really interested in having those conversations and helping to work on that. And really having this document generally is a call to action for us. Now we know what our limitations are, and they're out there documented and we really need to be ready with the latest tools on our end to meet the publishers at that line. And so I thought I'd talk a bit about some of the areas that we've been looking at enhancing since the end of the project. And just to say this is speaking from a political perspective, a preservation services start with different tools and so other preservation services may see different priorities for them. The first thing that was really apparent was that a lot of the publications we looked at had a really large number and variety of files that were considered part of the work. In fact, more than half of the publications had over 100 additional files on top of the main text, and these included images and PDFs but also audio video data sets and software and all kinds of things. And in many cases, these were referenced or embedded through the text using visualization tools and media players. And there's an example shown here in the middle you have a landing page for an audio file, which initially looks simple but it has a transcript file with it and also metadata. And on the right you see that audio player embedded in the flow of the text without that context and so we need to make sure those things are expressed in the metadata and they're so that they can be presented together when they're when they're made available for access. And then the second big theme was just the overwhelming use of iframes to extend platform functionality. In fact, almost every publication we looked at had an iframe. I'm familiar with what these are basically an HTML, an iframe is a way to visually embed one page, one web page inside another. So if you've ever seen a YouTube video outside of YouTube, outside of YouTube platform that was probably an iframe. You can see for Vimeo, Google Maps, ArcGIS visualizations. And so you can imagine how these are being used in publications. So the problem with this is an iframe can basically point to anything on the web. And so a lot of the time when you can you see these in publications, the only metadata you have is a URL basically so it's not clear who owns them or what they are and they might be susceptible to link rot and all of this. And so that unpredictability is a real challenge for scaling preservation. And so we're thinking about ways to handle those. And I think this is an example where working together around a standard would be really helpful. And then the third thing that was interesting was versioning. So a lot of the publications we looked at did follow the traditional model of assigning a persistent identifier and then the work stayed fairly static in terms of content. And so there was a handful of publications where it was clear to us we were going to need to archive different versions over an open ended period of time, because the content would continue to have feedback and change. And in one case the feedback was actually considered something that should be preserved as well. And so we do have systems for managing versioning but this is a little bit of a different model so we have to think about the best way to record this and give access to it in the future so it's quite interesting. At least some of the content just had an incredible amount of interactivity nonlinear navigation visualization elements, and they really just depended on being able to be presented in that software in the platform software. Filming Revolution was an example of a custom built site. And we looked at this during the project. We had to figure out how to preserve basically the experience of the work. So for websites, one option that a lot of people are familiar with is web having a web crawler visit the web pages from the outside, and then convert it to a web archive file so you can play it back and that's what the internet archive does. There's another option where you copy the data and the code that's needed to actually build the web server, and then in the future you can rebuild that and play it on an emulator. And actually we tested both of these and each does well in certain cases and not so well in others but basically both approaches and how well they scale really depend on how compatible the website is with the approach. And if you look at the guidelines you'll find that many of them are really about improving that compatibility so we can improve the chance we can scale this process. So since the end of the project we've actually added a web crawler to our toolkit, and we'll be thinking for a while about how and when to incorporate emulation for access. And that's not really for websites but for that massive variation of files, because basically some of them just need the software that they will create and to play them back. I expect that will be a ways down the road for us but it's definitely something where we're keeping an eye on. So switching back to the guidelines, I wanted to highlight some of the limitations as they relate to this discussion. So first of all, they were formed from suggestions by preservation analysts by examining things that already been published. And so we tried to figure out what could have been done differently to make them more preservable, but there was no opportunity during the last project to actually see if the suggestions would have worked. And then second, they were created without a full understanding of the implications for publishers. So they suggested a lot of workarounds for preservation but we really had no measure of how much effort that would involve or how practical it was to implement them. And so this basically adds up to they're not fully tested in the publishing context yet. So this brings us to the new project that Jonathan mentioned embedding preservability for new forms of scholarship. And so the project has several goals. One is to test the guidelines and this general concept of whether or not you can actually embed preservation into a publishing workshop workflow. Another is that we want to understand the effort, cost and roles involved when improving preservability of complex works. And then finally, we want to update the guidelines to be more usable evidence based and practical in a publishing context. And so I'll spend the rest of the presentation just walking through how we see this project going. So the first, first of all the project is going to partner with a variety of presses that create what we're calling new forms of scholarship. And for the last project we really focused on those long form monograph like works, but this time we've added some journal publishers so we can expand the guidelines for those as well. On the right, you see the publishers that have already agreed to participate. We have UBC Press, Minnesota University Press, ArtPublica Press, Gallaudet University Press and Michigan Publishing, and we're actually still having conversations with a few more so this might get longer. And then over to the right, you'll see that we'll also be working with the platforms that these publishers use and the developers of those platforms, specifically pub pub Ravenspace manifold and fulcrum. And so with the partnerships in place, once the publisher starts the process of a new publication. At that point, we'll embed a digital preservation team into the workflow. And that team will stick with the publisher until the the it's actually the publication is published. And so that's why we have a three year project because we want to be able to see the whole process through to the end. So we have, we're calling the team the embedding team. It's made up of four members. We have a project manager, Angela Spinase. We have a digital preservation expert from University of Michigan Libraries. We have an expert from locks and myself from Portico. So while the publishers working on a new project, the embedding team will basically carry out a series of interviews and through these will learn about the publishers process and platform. And then during that period will also provide support as the publisher select and implements any guidelines. So that could involve suggesting guidelines that we think they might have overlooked working with the publishing team and the platform developers to implement solutions. And then we're hoping that one outcome of this will actually be tangible changes to the platforms and processes that will then impact other publications that use those. Another part of the work is also to assess the guidelines document how easily can publishers determine what they need to do, how much support we have to give, how much work is required to implement them and so on. And then finally we'll be reporting all of this out to a bigger team who will be synthesizing this information and giving us further feedback and that will actually include a broader team from NYU libraries. And we'll also be consulting with web recorder on any web archiving challenges. Once the publisher reaches the point of publication, then at that point, the preservation services are going to actually test to see if we can preserve the final works. And we'll evaluate how easy it is to preserve when whether anything is still a challenge, and in general how effective any of the interventions have been for improving preservability. So as we complete all of these steps that the broader team will be following along and putting this information in to updating the guidelines. And that's where we'll add anything that we see is missing refine them based on concrete evidence and a much richer understanding of how they work within the process. And then finally just consider different ways to format them can we make them more usable and accessible for the people who are actually working in the during in the process and developing new features. So right now we're at the point of finalizing those partnerships and preparing to start the embedding process and so we look forward to being able to come back and give you updates as we make progress on that. So I'll end with just to say thank you for listening and and we're, I'm not sure if we have time for questions but we're happy to take that I'm happy to stay through the break and help answer those. Karen, we do have time for questions and please, I would welcome questions for Jonathan and for Karen on this really, really exciting project. Please either put them in the chat, raise your hand if you'd like to ask them verbally, whatever. I'll jump in with a quick one. So, I was very intrigued by the distinction you made between essentially doing a web crawl and basically capturing everything to fundamentally reconstruct the site. Could could you say a little bit more about the pros and cons of those two and what kinds of sites seem to lend themselves particularly to the, the capture all the assets so that you could rebuild the site sort of thing. So for the to do the emulation process the ones that were really a good match is where they're really data driven if you're doing a lot of search queries there's a search engine or something like that crawlers aren't so good if there's anything where you're there's actually a user interaction that's really specific. Like you're entering a search term. That's where that approach is better because then you have the underlying database and you can sort of recreate that functionality. So that the, the one I showed actually worked from both methods to summary. The other thing about the end cap, the, the emulation processes you have to be able to encapsulate the server which is some interesting blogs on the Stanford University press about that process of. We basically realized that there were certain assets that were outside of the server that wouldn't be part of that package, and they were actually dependent on those there was some videos on Vimeo and there was a font. Basically, and we, we measured how long it would take to move those things inside the server and it took basically two days and now the whole thing was encapsulated and able to be emulated that way so. So I think two things and emulation is is of the data driven ones that are hard to crawl, and then the ability to encapsulate that. Thanks, that's that's really helpful. Robert. Thanks, Cliff. Yeah, so my question was for Karen and she kind of touched on it a little bit and her answer to your question actually which was, you know, what are some of the things you're looking at in terms of determining what constitutes more impactful, right so you mentioned the length of time it takes to encapsulate a server being, you know, one of them but I'm curious what other, what other metrics, you know, go into that calculation and maybe other heuristics that don't get captured as numbers as well. I think probably the one that came up as the most impactful is dependency on third party services. There was some publications that already had broken links with just, you know, gray spaces in the publication. And, you know, because they were YouTube channels that had gone away and stuff like that. And I think the most concerning is those where those I frames I mentioned where they're pointing to things around the Internet, maybe we don't know what the rights are we don't, you know, those are the most challenging I think. And probably the other, the other thing will be the most. The stuff we can't use either of those web preservation approaches. In cases when neither of them will work. And typically that's the same, it's the same age dependency on proprietary software dependency on third party platforms like the some of the most difficult ones where the majority of the interface was a map that was, you know, an ArcGIS map or it was using some third party and you can't. I mean you basically need to recreate that functionality in order, or figure out that something else to do with that that that's one of the most difficult. And that answer the question. No, that's very helpful. And I just wanted to add that, you know, we started off always started off with an examination of what was essential. What was what what what were the important most important pieces of any digital publication. And so, you know, what is preservable is always in relationship to what constitutes the essential parts of a particular publication we have all kinds of digital publications where there may be supplemental features that are not essential. And so, you know, that that that publication may be very preservable for its purpose in the scholarly record, even if you can't preserve all of the pieces of it. But there are other publications where this map is absolutely essential to the user experience of the way that the scholar and the publisher envisions the work. Well, thank you for those. I think it's probably time for us to go to our break. We will start again at about 12. We'll be about 230 Eastern and we'll see you all then.