 Welcome everybody and thank you for joining us. I'm Cliff Lynch. I'm the Director of the Coalition for Networked Information. And you have joined one of the project briefing sessions for the spring 2020 CNI virtual meeting, which is scheduled to run till the end of next week. We have a presentation with five speakers drawn from quite a range of institutions and you can see everybody's name and affiliation on the slide here. Taking up an important topic which I will introduce in a moment. After everybody has given their part of the presentation, we will field questions and try and respond to them. Diane Goldenberg Hart from CNI will moderate that Q&A session. There is a Q&A tool at the bottom of your screen, which you can use to type in questions. And while we'll address all the questions at the end of the presentation, I'd invite you to queue up questions as they occur to you during the talk. There's also a chat box and we'll be sharing a few URLs there, so that that may also be a useful thing for you. So, with that, let me just say a couple of words about this topic and I'm really delighted to have this presentation. And as all of you who have been involved in scholarly publishing or indeed digital humanities as well know, it's been a tremendous ongoing challenge to get people to take advantage of all of the affordances of the digital environment as part of scholarly communication. And a big piece of that has been the persistent concern that their work will vanish after a certain period of time because it will become technically obsolete and unpreservable. And because of that scholars have been very cautious about straying from things that, you know, are sort of the moral equivalent of print on paper, even if it's done digitally. To really address this we have to figure out how to do it at scale. And that's why I think it's so important that we have university presses who are accustomed to thinking at scale. And organizations like clocks and portico who are trusted preservers who think about preservation at scale as a part of this discussion. And I'm really hopeful that out of the kind of work that's reported here. We can see the emergence of genuine strategies at scale that will address this problem and make our scholars comfortable that they can use at least some well defined set of additional affordances the digital environment so I'm really interested to hear what our speakers have to say. With that I've gone on way long enough and I'll turn it over to David Milman to start the presentation. Thank you all for joining us and a special thanks to our presenters today. Hi, thanks Clifford, and glad to be here. Thank you all for coming. Today we'd like to give you progress report on our project to enhance the preservation of new forms of scholarship. I don't want to introduce it too much more because I think Clifford actually gave a really good summary and we're just happy that everybody's able to be here. The project is organized in three distinct phases and we call them sprints today we'd like to report primarily on the first one which we completed earlier this year. The second sprint is wrapping up now and we'll talk a little bit about some of the lessons from that sprint today to I think, and the third sprint is starting up around now. We're slowed down a little bit by the pandemic but as we've been largely working with each other remotely already we we've been able to continue along pretty well. So today our focus, along with clocks and portico is on two of our University Press publishers, Michigan Publishing and NYU Press. Michigan has developed an online platform called Fulcrum, and many of you may have seen presentations about it in at CNI in years past. And NYU Press has developed the open square platform and we'll tell you more about that in a few minutes. Then we'll review the experiences from clocks and portico about how they dealt with these publication platforms. And finally we'll talk about next steps and our recommendations. So first up is, let me just see if I can advance my slides. First up is Jeremy Morse, Director of Publishing Technology at Michigan Publishing. Jeremy will hand off to Jonathan Greenberg from NYU, we'll talk about open square. Next day after Jonathan will be Tibb, we share a plan from Stanford, Stanford Libraries and they run the technology that underlies clocks and locks. Tibb will hand off to Karen Hansen from portico and Karen will then turn it back to Jonathan to hear about next steps and we'll hopefully have time to take some questions. So Jeremy over to you. Okay, hi, hopefully you're seeing my presentation okay. I'm Jeremy Morse, Director of Publishing Technology at Michigan Publishing, a division of University of Michigan Library, and I managed the development of the Fulcrum platform. I'm here to give an overview of the way we present scholarship on our platform and how we're aiming to see it preserved. Okay, we'll start with a brief tour of what we're doing on the platform so just for an overview. Fulcrum started from the idea that the digital resources that are generated in the course of researching and writing a scholarly monograph should be co-presented along with the finished ebook in a way that supports the scholarly argument. To this end, Fulcrum is first and foremost an ebook platform with a particular emphasis on supporting EPUB 3. But Fulcrum also serves as a repository to make resource files discoverable and downloadable with a growing number of formats viewable on the platform's user interface. And if desired, these resources can be presented in the flow of the text. So to look at some examples, here we see the landing page for a book on the ACLS Humanities ebook collection, which is hosted on Fulcrum. This is volume two of the Applanta series. And by clicking the read book button or one of the entries in the table of contents, we open the EPUB in our ebook reader. And here you can see, in this case, there's a figure that is included in the EPUB file itself, so it's packaged in the EPUB file, as you would expect to see in any downloadable ebook. But if we follow the link for the resource, we arrive at the resource's own landing page where we not only see its metadata, but we can view it in more detail in a IIIF viewer. Now, if we go back to this book's landing page, we can select another tab and get an overview of all the resources for this title. And here's an example of another book, Animal Acts, which has a number of video resources hosted on the platform, rather than include them in the EPUB file, which would load its size considerably. And it would also use the default HTML5 video player, which is fine, but what we're doing is embedding them using an iframe embed code, which is generated by the platform. And allows the reader to play the video using the third party application ABLE player, which we've integrated into the platform, into our e-reader. And that lets readers take advantage of ABLE players features for caption display, speed manipulation, etc. The IIIF viewer from our previous example could also be embedded in the same fashion. The author in that case just decided to go with a different kind of presentation. In our most complex title on the platform, Amid Republican House from Gabby, a 3D scene encapsulated as a WebGL object is presented side by side with the EPUB so that they can be navigated independently while also taking advantage of cross-linking. So that would be another example of not being packaged in the EPUB, not being embedded in the EPUB, but just being co-presented on the platform. So what exactly are we trying to preserve of all of this that we're presenting? I should note, the goals I'm articulating here are those of Michigan Publishing, and they are in line with those of the University of Michigan Library generally in terms of their preservation strategy. But those principles guide the development of the Fulcrum platform and therefore apply to the non-Michigan content that we also host. So our overall strategy is focused on preserving what we consider the version of record, those components which can be assembled into the scholarly argument and the description necessary for that assembly to provide a functional work. For example, while we think having the video embedded within the text makes for a better reading experience, a link to where that video is viewable is probably adequate for preservation purposes. So you can shoot high, but we have to think of what our reasonable fallback is. So what's important to us is that you can cite this work today and be confident some future reader can follow that citation and make sense of what they find there. Understand the argument, understand why you cited it, see what you basically were seeing when you made that citation. What does that leave out of scope? We're not focused on preserving the Fulcrum presentation of the content as you see it today. Navigation, for example, might look very different. So maybe formatting changes too. While we're interested in the possibilities of emulation, our investment continues to be in a format migration strategy. And while we strive to present our content in the best manner possible, we recognize that the preserved form while maintaining an essential functionality may be reductive in some way due to cost constraints. In other words, we're satisfied with not locking in the design choices or even the implementation that we've made today, because in general we think that we're going to be able to improve them over time anyway. So trying to keep the content fluid to enable that a better future for its presentation. And with those goals in mind, I'll hand it over to Jonathan. I think you're muted, Jonathan. Let's see, Jonathan, can you turn on your microphone? So I'm Jonathan Greenberg, I'm the Digital Scholarly Publishing Specialist, and I have, I report jointly to NYU Libraries to the Digital Library Technology Services in NYU Libraries and to NYU Press. My role in this project is both as a representative of NYU Press and their enhanced digital publications. And as a member of the team in the Digital Library Department in NYU Libraries. Today I'm mostly going to be talking about NYU Press and its OpenSquare platform, which you can see on this slide. OpenSquare is an in-house platform for open access books. So unlike Fulcrum, OpenSquare is really just designed as a platform to display books published by NYU Press and only NYU Press. And to do it in a way that is efficient and does not take excessive resources away from the press or from the library. So it's sort of a working model for a sustainable library built e-book platform for academic books that uses available open source software. So we use the Redium JS viewer in OpenSquare and have plans to implement a next generation of Redium EPUB viewer into OpenSquare when that is ready. We're currently working with some outside developers to help develop that next generation reader. The other goal of OpenSquare is to provide a place for the press to publish enhanced e-books. And that's where this partnership really comes into play between the digital library, which is a department of software developers, project managers who can really work on developing the technology required for enhanced e-books. And the press, which has a large and important list in media studies. Many of the media studies scholars who work with the press have a keen interest in publishing these very new forms of scholarship that this project is devoted to. And finally, OpenSquare was developed as a kind of testing ground for research projects such as the enhanced network monographs project and for this one. And like Fulcrum, we decided to use EPUB as a standard format. EPUB is already something that is produced in the press's regular workflow, and it's something that most publishers already produce. And so if we're to develop a kind of efficient platform where the press can deposit open access works directly into this platform, then EPUB is really the way to go. Or it was for us. So preservation challenges. And these apply, I think, equally to Fulcrum and to OpenSquare. EPUB is a rich flexible standard because it's designed to accommodate a range of textual and non-textual features. The EPUBs are basically a package of HTML, CSS, SVG, but can have all kinds of other things included in them. Anything that can go inside that HTML can be valid inside of an EPUB. So it can include code, it can include references to remote resources. And so developing preservation strategy for EPUB as a whole standard is a kind of challenging project. However, most scholarly books, I would say, aren't so bad. You know, most scholarly books contain some combination of text and images that we're used to preserving. We need, of course, to get proper metadata and proper processes in order to deposit these works into repositories. But the vast majority of scholarly books, I'm quite confident that we can develop processes to preserve. However, once we start to enhance these ebooks and enhance them in particular ways, the challenges start to appear quite quickly. There are two, the NYU Press has published two books as enhanced ebooks, and both of them make use of remote resources in some way or another. One of the books by any media necessary includes embedded YouTube videos that are embedded at various points in the text. Shosel separately includes both embedded video, this time hosted on Fulcrum, and embedded images. And these are the images and video that we see on the Fulcrum platform. This book was produced jointly on both platforms as part of the development of Fulcrum and of Open Square. So this was, you know, this is something that the authors at NYU Press are asking for, something that we would like to support, but we really didn't know how we could efficiently and sustainably preserve these works. So we were very, very keen to work with Portico and Clocks on this project to develop these these strategies. And with that, I will hand it over to Tib. Thank you, Jonathan. I'm Tib Deshaquella, I'm the acting program manager of the LOX program. I'm going to share my screen so you can see my slides. The LOX program is a 20 year old institution at Stanford University Libraries, but we are also the technology partner for the Clocks archive, who are one of the preservation services represented in this group work. And so, before we delve into some of the specifics of this project, just one slide of the general methodology notes about sort of LOX preservation technology. LOX preservation technology is rooted in web preservation, the preservation of web based materials and web native content. The LOX plugin is how you teach the LOX software, which is sort of a very general framework, how to harvest and process a particular preservation target, for example, and you book publishing platform or some of the form of content. The LOX archive uses two main preservation workflows, one that is more like the traditional LOX methodology of preserving web native content by harvesting it from the publisher's website. And sort of a separate workflow where content is transferred directly from a content provider into the archive and stored as is as source content so it can be whatever files that the publisher has provided. And so, using a LOX plugin to preserve any particular piece of content is an iterative process, a typical work cycle would be an analysis phase adding and editing rules and code to the LOX plugin, crawling the target, and then reviewing the results and trying out the replay, and this feeds back into a new analysis which sort of completes the circle and iterative development occurs in this manner. I hear Cliff's representation that publishers have been shy about availing themselves of straying too far in the electronic era from the moral equivalent of publishing a hard copy book, in that I think that that era has actually sailed, that boat has sailed 20 years ago. The era of URLs or documents is long gone, and even though publishers don't intend to have electronic content that's traced too far from the equivalent of publishing a book, the implications of their technology choices and of the web environment have actually strayed from that long ago and have made web native content harder and harder to preserve. So here are just a few common misconceptions about web content URLs or documents that's not the case anymore. HTML pages are self contained. They used to have links to images and JavaScript and CSS but now these things are themselves dynamic and there is oftentimes no universal way to discover what all URLs the browser will generate and fetch at runtime. HTML pages are essential content plus personalizations nowadays that's also not so true content is rendered onto a canvas of pixels that that the HTML page is from multiple sources, authors provide the work to the publisher and the publisher provides the work to the reader. That's also not true videos images are now hosted on third party services. Some of the primary research content is now hosted from arbitrary places on the web like YouTube or other places. Even PDFs and EPUBs are no longer static bundles they're now dynamic. They're they're logical bundles but they're not physical bundles anymore. The other ones are no longer provided by publishers. So they are now from third party services that can go away or be unavailable at any time, and even images are no longer static resources on the web so all of these factors contribute to web native content being increasingly difficult to preserve. There are six works that we focused on from the Fulcrum platform so this particular presentation is about experiences with the Fulcrum platform. Here's a screenshot from Animal Access we saw earlier. Here is now another screenshot of what it looks like when re-rendered after having been preserved in a locks environment. As you can see some of the, some of the fonts don't look right, some of the images are too big. You can tell that in spite of our best efforts we could not discover all the JavaScript and all the CSS that is necessary to render the page correctly. Here's another example from volume one of Applantus and here's what it looks like in in re-render and obviously same thing there are some some fonts that aren't quite right. There are some specific examples of things that are hard to preserve with a faithful re-rendition at replay, the fonts, some of the colors, right, I mean this is sort of stuff that's embedded in CSS. Some of the characters are now from sort of dingbat or emoji fonts and so the characters may not be present at replay. Some of the spacing of the tabs is wrong, some of the thumbnails are too large if they don't have all the information. These arrows are pointing to sort of dynamic features you can sort and filter the content by date and by size and you can have 20 per page or 40 per page or 100 per page. You can generate different URLs that are it's the same content it's the same it's the same stuff but it's just re-rendered differently and generates a combinatorially large number of URLs. It's also searchable by you know keyword and section and various other ways and that cannot be preserved without having access to a lot more of the underlying platform and then attempting to replay that platform and that's not not not easy to do. Here's an example of something that was difficult, even though the publisher did not intend to make it difficult. This is just a link to a resource page that particular video clip. As you can see it's a very standard A link in HTML and has an href to some file which was obviously discovered and preserved. This little data context href additional tidbit and JavaScript embedded in the site hijacks normal browser behavior and when you click on that link it doesn't go to the target of the link. It first goes to that site link which I guess is used for tracking and then delivers you and moves the browser to the target URL. So, because this side URL for tracking purposes is not preserved at replay time, the browser complains that it was forced to not go to the target link but to the tracking link first, but the tracking link is not preserved so that's a four four. Here is an example from a video asset as it is this is a screenshot from the sort of in a browser on the real live website. This is a screenshot from replay so this is fairly faithful the video plays using an HTML five reader the transcript follows along at the same time. Here's an example of a an image resource live on the website versus as we played from block system, and you can click on the download PDF button or you can enlarge the image and you will gain access to that resource that was embedded in the work. This is a more difficult one. This is a map and as you can see it has little sort of zoom widgets and buttons and it's really made of tiles it's not really a single image. It is served dynamically by tribal service service. And so in replay, it looks okay until you try to zoom in and then it does not know the tiles. So it displays a gray box instead. If we zoom all the way in, then we have sort of found the smallest tiles and generated automatically by reverse engineering what the particular widget the particular triple if widget would like to do from these tiles. And, but if the widget word change or if it decided to do arbitrary subsections of the image we would not have those. So this is a screenshot of the page turning widget. And obviously, we did not actually preserve this we're preserving the PDF and we're preserving the pub as bundles of bytes, but we did not have enough time to work on sort of re rendering the arbitrary e pub widget and the page turning widget. So in replay, the work is being preserved if it were to trigger, we would have all the pub and all the PDF to at least give a somewhat comprehensive experience, but not an interactive experience. So just a quick conclusion before I give up the floor to my colleague Karen from portico. There were some known lessons in here certainly within the area of web preservation that we know about combinatorial explosions of equivalent URLs are difficult to preserve dynamic font resources are difficult to preserve the JavaScript hijacking of browser behavior makes it more difficult to replay the preserve the work. Even if we have all the pieces of it, and these interactive rendering environments are difficult but in this particular instance we also were faced with the realities of sort of new lessons learned. So even images are not static anymore and so it is difficult to obtain them and then replay them in the interactive way that is now more common in sort of enhanced environments. And the fact that external resources are present in e pubs is making it much more difficult to preserve because it can be arbitrary places on the web. And I know that Karen has much more to say about this sort of embedded e pub resources topic. Thank you for your attention and it's now turned for the Karen's turn for a portico. Thank you. So I shall attempt to hijack the screen. Hopefully you can see my slides now. So I'm Karen Hansen and the senior research developer of portico and I'm just going to be telling you about what we did with some of the examples you saw from fulcrum and open square. And I just to mention this is the first phase of the project so some of the more advanced works that we're showing for fulcrum we didn't get to yet so that's still to come. So I just want to mention a couple of things about portico's perspective. So when we're talking about scalability of a solution when a when a publisher participates with portico we spend some time with them configuring a workflow for their specific content. So that could take you know days to weeks depending on the complexity and what the goals are. So when we're talking about scalability it's it's in that form in the context of portico. That's what it says so if a publisher can no longer give access to a resource for whatever reason we might initiate a trigger event in which case it will become accessible on our platform and sometimes that has to happen quite quickly so we don't expect our users to be file format experts or anything like that so we want to present it in a usable form that they can we can see all of the intellectual content. So Josh, Josh already touched on some of these but essentially an e pub is like a zip file in a zip file with a website in it basically encapsulated. It expresses the content as XHTML which opens up the possibility for transforms. So in section three, it supports remote multimedia and just mentioned that we received, we didn't do the web hosting approach we received all of these items as packages containing e pubs. So that's what you'll see here. And the specific issue we focused on in the first section of this project was these remote resources and what I mean by that is where visually it's embedded in the e pub, but the file itself actually lives outside of the URL. And when I first started to research this what I thought I would see is that just the file was outside of the e pub. And then it would be a matter of the choice of do we, you know, create a persistent link for that and update the link internally in the e pub or doing the resource inside the e pub. However, as you might have picked up on from the examples that came across a few challenges with this assumption. So the first one with was with fulcrums platform and Jeremy showed you some examples where they have these iframes embedded in the work, not just referencing an external file, but actually a nice custom viewer. And you can actually, that's just a view into another web page and you can even pop that out in a browser and you'll see the features there. So when they gave us the package for the e pub. What was in it was the e pub with those i frame references inside. So we didn't have the media viewers. We had a list of we had all the multi multi media files as separate from the e pub. And then we had a really comprehensive metadata file that essentially give us instructions to map the i frame paths to those media files. And I believe this as is, when we triggered the content, it would look like this assuming that the fulcrum URL was no longer available. And so there was nothing here to indicate how to get to that thing and actually the link to the resources inside the i frame as well if you want to get to that landing page. The books varied but in some cases that was that true. So we're doing it as is would basically be asking the user to assemble this. And that would require some knowledge of the e pubs and some of between paths and things so we wanted to see if we could do a better user experience with this and preserve some of that some of that experience. So, so we wanted to see what it would take, given the instructions that I focus package for us. What would it actually take to make that whole again, and what would be lost in the process compared to the original. So the first thing we would have to do is take out those i frames and replace them with generic HTML players. And the instructions are there to do this I was able to do it with XML transforms. But so, but you lose the quality of the fulcrum player. We were able to maintain the subtitles since that's the video tag for HTML five actually supports that. And then for the audio player. Again using generic HTML five tags. We use this nice scrolling transcript feature. But I put that in the text area, instead, basically because the audio, the generic audio tags and support a transcript. And then for the images, I removed the i frames that were displaying these nice image viewers and instead just embedded the image directly. So the second step we might do and I think this is kind of optional because you can do it you could do it two ways but to get a nice self contained package so that you put this whole portable might move a lower resolution copy of the file inside the pub, and that would be a simple case of redo it rearranging the package and pointing the links inside. So you could of course leave them outside and generate persistent links with using something like arc IDs that can point right to the file. But this is, this is what this is another way of doing it. And third, where there isn't a caption already, you would want to maybe include some of that metadata under those embedded features, and then include a link, a persistent link to the landing page for the new location for the resource. So imagine if this was already here it would just be a matter of repointing the link to the new location. So, so what you get is an EPUB package and there's some pros and cons of doing the results. So, on the plus side you get a usable self contained EPUB you no longer depending on specific web pages to be there. So you have all of the core intellectual content in a single package. And because you have these external links landing pages, those landing pages could evolve over time and, and maybe introduce some of those things that were lost so we might have a nicer image viewer for example. On the downside you end up with an EPUB that's much bigger than the original. And if there's a lot of media that could be problematic. You lose all of those nice fulcrum experience pieces to do with the media players. So the other challenge we saw was with open square and Johnson Johnson mentioned this, but some by any media met necessary has these YouTube videos embedded. So of course you can imagine over time that this might happen where the video could just disappear without warning. And in this instance, the publisher was not able to get permission to copy this video and preserve it with the package. So, political policy is that we will only preserve what the publisher gives us so what we have direct permission to preserve we're not we don't sort of go out to URLs and try to capture content. So I was trying to think of a work around for this. And one thing I came up with that might be able to do might be able to do in an automated way was to go through the EPUBs and identify anywhere that there's an iframe with a YouTube reference and then use the Internet archive safe page now service which you can use as an API or manually. And then it gives you back an archive link and that could be automated automatically appended under the video as a little note to link to the archive copy. So if you click on that link, then it points to the way back machine. And you can see a copy of the video there. So now you have a means to get to the video if it's no longer available. On the downside, if the video is still available you're going to have this mysterious archive message that is interrupting the flow of the book. If the video is gone already you're still going to have a great box. And you lose that convenience of having an embedded playable video. And also this needs to be done as early as possible at the point of preservation it might already be gone and you might miss the opportunity to capture it. I think ideally a publisher would try to secure the rights to take a copy of the video which can be difficult for you trying to track down YouTube users and whether it's their content and things like that. But if you are able to get rights, there's this tool called YouTube DL. And I think it might be what the archive uses actually, but that will copy the video and give you the file and the metadata. So you could include it with the preservation package. And I think there's a really narrow set of use cases where Portico would do something like this because because of our policies around getting permission. But for individual cases that the publisher might be able to use this tool. It is a command line tool so you could integrate it into another system or you could use it manually. So a couple of conclusions from this. So I've shown you a number of ways that we can put together these these packages so that they're, they're easier to preserve and easier to access. But each of these comes with risks. First of all, each example involves actually visually modifying the pub, which is something we try to steer away from we want the publisher to decide how their works should look. So that's, you know, that would be a decision we don't want to make. Also, the process is likely very fragile. Just if you've worked with anything like HTML that you know the tag could appear that you weren't expecting and all of a sudden it looks awful. And, you know, you would not be able to detect that very easy or easily or in an orderly way. It's also a lot of extra configuration. Some of these took several weeks of work and a number of transforms so that could be very costly to configure. And finally, by the time we're taking preservation steps, those remote items might already be gone. So we might even have the ability to capture them or know what they look like. So all of this is to say ideally these kinds of steps would be upstream on the publisher platform. Where, you know, they could become part of the part of the process of creating an export or part of the preparation process for work. So just to summarize, I wanted to generalize what I saw here. I think this is about EPUBs but they could be thought about more broadly, I think. In general, it's easier for us to just replace a link than try to reconstruct what something should look like or reconstruct functionality. So it's good to make any present any important presentation code internal to a work. If resources have to live outside of the EPUB, it's good to use URLs that are easy to configure over time, preferably persistent identifiers. In the case of DOIs, we can actually point them at Portico if that's where the access copy is. Sorry, got a little thrown in my throat. And also it's good to display useful captions under embedded features that might provide information to help find alternative pastoral resources if it doesn't work. Finally, it's good to assume that any third-party features like a YouTube video are just inherently unreliable. So think about ways you might want to get the vital information from that in case it's gone in the future. So whether it's a description or still images or things like that. And I'm going to hand it over to Jonathan and take a drink now. Thank you. Jonathan, I think you're muted. Thank you, Karen. I just want to briefly go through some preliminary findings from that first sprint. I'll say just generally the goals of the project are twofold. One is to develop processes at scale for preserving these new forms of scholarship. And that is where Portico and Clocks really come into play. They're, you know, one of the parties that would presumably be doing this preservation. But the second goal is to write recommendations and best practices for authors and publishers. Just like Karen was saying, upstream, when publishers are considering publishing enhanced books or other kinds of new forms of scholarship, they're aware and they're thinking already about the risks and the benefits involved, what technologies they're using, how they're doing it. And they have clear-cut guidelines for doing that. So we, in this project, after each sprint, we are going through a process of putting together all of the findings from that sprint and discussing them with experts across the country in digital preservation, legal experts, publishing experts, and here are some of the things that we kind of are thinking about after sprint one. Publishers and authors should be thinking about the centrality of elements of the user experience to help determine what really needs to be preserved. And I think Fulcrum has done a lot of really great thinking about this. There's always this tension in what we're doing between preserving the user experience and preserving the work. Because in many cases, and especially many cases in later sprints, the user experience is actually a part of the work. And drawing those boundaries can be quite, quite difficult. But this is something that's really important for authors and publishers to think about from the start. Second, in cases where authors and publishers decide to include links to at-risk content, it should be aware of the risks and trade-offs. And this is something that we really plan to provide guidance with in our best practices document. So one idea, and Karen kind of maybe referenced this, but when external media are included, it would be a great idea to have some way of making those links stable so that the location of the media can be moved. But the code, the links in the code do not have to be changed as they are moved into a preservation repository. And one idea that we were kind of turning over and discussing with many of the partners in this project was the possibility of embedding content into a large EPUB for the sake of preservation. Now, this would not be an EPUB that would be easily used for exchange. You couldn't send it easily over the web. But perhaps for the purposes of Clocks or Portico, the ability for publishers to create such an EPUB, a valid EPUB according to standards, would really aid in the preservation process. So I'll just say before we go to Q&A that I think this project kind of deals at a high level with tensions between digital preservation principles and the business requirements of solutions at scale. So there's often a temptation to do too much because we can say, well, here's a kind of technology, here's a solution that would work for everything. And so we're tempted to use it across the board. And similarly, there's a temptation to do too little because we might say, well, we can do this for every publication and that would scale. But it may not in fact honor the scholarly content that needs to be preserved and then accessed in the future. And so this project I think is about kind of exploring the range of new forms of scholarship in this field in these digital monographs and coming up with guidelines perhaps how to treat different features, how to treat different types of technologies in this landscape. So with that, we are very happy to take some questions. Great. Thank you. Thank you all so much. That was a really fascinating overview of the issues involved in preserving these enhanced monographs and the work that you all are doing to tackle some of these challenges. So I want to go ahead and open up the floor for questions. I'm sure there is a tremendous amount of interest in the work that you're doing and its implications. And we do have a question from Rob Carlano already. Hi, Rob. Looks like Rob has a two part question here related to this last point embedded content, embed content into a large EPUB for preservation. Number one is EPUB 3.2 container format sufficient sufficient for this purpose. And number two, does this also support usability over time, in addition to preservation, for example, to be used with other readers such as simply E and other medium and non non medium EPUB reading engines. I can take a stab at that. So I think if you are going to do simple embeds in the form that I was talking about, then I think it is compatible. And actually the iframes were causing some validation problems. And also didn't play well in a number of readers that I tried them in where doing simple embeds worked much more reliably. Okay, thank you. Okay, I'll just add that I think that there's there's a lot more thinking to be done about that idea of a large, you know, a large hefty EPUB. But I think one of the challenges just is from the publisher side, which is to say that publishers are not used to creating such a thing and it would be a special thing for this purpose. So that's something to explore on their end. I don't know if anybody else wants to weigh in on that. If I, if I have a second to also inject a thought I would say that my experience is that in general small adjustments from publishing platforms to facilitate preservation can go a long way. And in my experience publishers are reluctant to to engage in them because it is a special project and it is not with a big payoff. And so I think there's a generally speaking, it needs to be more of a directional awareness from, you know, management than it is about the technology we have workarounds we have solutions but sometimes it's difficult to convey that the preservation aspect is just as important as other accessibility aspects. If I could chime in as well. Just thinking about our production process, you know, we're, we're producing one EPUB file that goes out to vendors, like Barnes and Noble say that conforms to their, their constraints and what they'll support and their, their reading systems or what they'll even allow in their distribution channel. For instance, they don't tolerate videos being included in that in that EPUB file. And then when we upload them to to Fulcrum, we have a process where we're optimizing them for the Fulcrum platform specifically for that presentation. And I think another process that is creating a third version of that EPUB for a third party preservation system makes a lot of sense just to be dropped into that pipeline. And so I think this research. It's something like a specification for how different platforms or different production processes really even before it gets to the platform could normalize that content for better preservation to get a something something closer to the Fulcrum preservation. That's better than what Barnes and Noble is allowing not to pick on them. I think that would be a great outcome. I think it's not out of the question to keep the videos outside of the EPUB it's it's useful to have them inside because then you've got to preserve the link as well which is extra complexity of the preservation process. So that that's that's an option to great. Thank you. Thanks Rob. And Rob also just wanted to share with you that that was a great presentation so thank you for that as well. It's time for questions so please feel free to type them into the Q&A box or if you'd like to make a live comment or ask a live question just raise your virtual hand and we can unmute you. So please feel free to do that. I know you meant your comment to be a little less public but I nonetheless I'm curious if you don't mind. I think you perhaps wanted to take up some of the risks and ideas in Karen's slides what do you have you guys with comments on those you want to share. I mean I wanted to if we have time and if there are no further questions from the audience I think we should prioritize those but I meant to join Karen on many of the points that she touched on in the risks and ideas categories I think we are very much in alignment with her for this particular project. She invested a lot of work into the EPUB half and I invested a lot of work into the HTML website portions and yet the conclusions are quite similar the ideas are quite similar the obstacles are quite similar. And those are that transforming the EPUB upon ingest if you think of it as a workflow has many downsides and many dangers about it and many risks and it is difficult. I wanted to concur that it is very difficult to detect in a universal kind of way what all can have gone wrong and what all can be missing without coming through you know manually a lot of material to make sure that it does work and replay well and has all of the external sort of pulled into the preservation service so that it isn't at risk of being out there and disappearing and not actually being right and appearances can be very deceiving in an EPUB or a website so that the resources really there when in fact it is replaying remotely right now but will not necessarily be there for the long term. And so I was struck by how similar are sort of conclusions are even though we approach to this from somewhat different angles. Interesting. Yeah. It was interesting parallels Karen. Have you got any comments on that. I 100% agree. I think that something I noticed working with the fulcrum. I worked with maybe five four or five different fulcrum books, and there were subtle differences and I think Jonathan mentioned that there was a choice. I remember you mentioned that someone had made a choice to just embed the image versus using the viewer and those sort of choices can have impacts on what we can automate and then all of a sudden you're doing two different versions of the workflow you know that they branch out at that point. And that sort of thing is easier to to get at on the platform and where you're making the making changes in different directions and you just sort of incorporate that that switch there too. So that's why I said moving those kinds of things upstream is so much easier than trying to do it on the ingest end. Indeed. And it will be definitely interesting to see what kind of best practices and recommendations come through from this investigation. We have another question now. Thanks for this presentation terrific work. I'm interested in the presenters perception about the viability of solutions at scale presentation seems to be based on experience with about a dozen works, all of which seem to be unique and handcrafted is really a standard platform that lends itself to preservation at scale, or a slightly more constrained version of the Wild West of the Internet that makes it highly resistant to preservation. I'm happy to take a stab at this question and other people should chime in also. So I think that this is a very astute observation which is that EPUB is a bundle of arbitrary links to resources and files kind of like web pages are so it is in fact suffering from the same or at least similar conceptual risks that arbitrary websites do. It is a little more constrained because it's not quite as programmable or arbitrary as a website or a web page, specifically to sort of like the locks methodology for things with a plugin a plugin is meant to encapsulate the commonality of a particular publishing platform. So in this case, some amount of iterative work can yield to pretty high coverage of the features and functionalities of a particular website or publishing platform, meaning that that work is invested and then applied more broadly to the entire platform. And there's a diminishing rate of return over time on those sort of weird features and exceptions that aren't obvious in a sample of a dozen works or half dozen works. So there's a big payoff here in having a platform that even though it's customizable eventually squeezes the content through a bunch of features that it has and that's the finite set that it has. Some of the works that we are working on for the later stages of this project are increasingly one off increasingly unique where an arbitrary web presentation of work or scholarly research has been made. And then that work doesn't scale quite as well because there is now no payoff of having analyzed and circumvented all the widgets of a particular presentation. And then moving on to the next work which is yet completely different than was built from scratch using different tools and resulting in a completely different visual and technological presentation. So I think that EPUB is one way to funnel the diversity through a particular standard. But of course, you know, one could one could be cynical about it and say oh yes well that works so well with HTML being a standard and JavaScript being a standard right so it there's definitely some some risks there. I'll just I'll just add on to that that I think that it's important also to think about the kind of cultural norms and constraints is as far as the publishing world go goes so yes. You can put, you know, millions of things inside an EPUB and you could do crazy things that would be very, very hard to preserve. But in real life we do see patterns in scholarly publishing. We see certain kinds of things perhaps in certain fields perhaps with certain publishers, and that's part of what we're trying to do is to give to develop services and guidelines. To preserve not any EPUB at scale any set of EPUB that scale, but EPUB from this particular community of publishers and and scholars. So I think that's, that's part of what we're doing is maybe drawing those those boundaries and say what are the most important features that we need to be able to handle, and what's out of bounds, what is scalable how can we how can we help this community preserve their work better. Thank you. And thank you for that terrific question. On to the next question here another question from Rob Cartolano who comments it's so fascinating to have common findings from multiple approaches. The EPUB open container format quote distribution in a single file container, potentially provide a solution to address usability and preservation. And he includes a URL here which I will, but shout out to everyone I think those are the specifications, perhaps. There we go. Anybody like to comment on that. This may be something for us to look into and think about I mean I am familiar with the open container format at a high level but not not well enough to comment on that. I think that there's certainly something appealing about a website in a box having something you can have that self contained I guess in a way a web archive files has similarities in that respect. So I think in terms of usability and not having to keep track of all of the external links that you need to support the work then it is, it is a useful format. And if the fact that it's an open format and it's very when you open it up you can if you know some web languages you can understand it quite easily it's an attractive format for preservation. I agree also that any attempt to say this will be a self contained bundle and it will not have any external dependencies is good for preservation obviously. In research the PDF format is thought of it as being that way at least if you get the PDF it's a single thing but even PDFs now are containers for arbitrary external things and have widgets to replay on the inside so so there's limitations but I think that this. This is a promising format. Thank you for sharing that Rob looks like some grist for the for the mill. Alright, well we've had some great questions. And we still have a little bit of time for more questions if there are any out there please go ahead and share them with us as we're waiting. Thanks to our panel. Thank you so much for coming to talk to see and I about this project and just an opportunity to let you all weigh in one last time if you have more comments to add. Please feel free to do so. And I have a actually a question since I don't see anybody else jumping up and I'll do it by audio because it takes a little setup. So, if you go back to the age of the print book, you certainly had citations in a printed book to other works and in preserving the work you would never think of you know I have to preserve every other work that it's cited in order to preserve this work. You also typically in the print world had a sort of a negotiating process where if you wanted illustrations. You'd look and you'd see, can I clear the rights for this picture that I might want to include how much will it cost how critical is it, and you might have have what essentially constitute citations to paintings or something like that that you couldn't clear rights for and or didn't feel were absolutely critical. So when we go to digital works, something funny goes on because you have a tendency to want to use you URLs or do is or some kind of actionable identifier, because they're very convenient for the reader jumping from one work to another. I'm not at all clear that we have a language right now, whereby authors and publishers can sort of signal that intent that these really are part of the work as we think of it and these are really site external citations and they just are actionable largely for the convenience of the reader. Do you see any signs or hear any thinking about how authors and and publishers may be able to signal those intentions as we get more serious about preserving things like like ebooks. So that is that's a great question. We've, in some ways, what we've done with this project is kind of just drawn a line in the sand and said that those things that are links, which is to say it's a hyperlink in the text and it takes you to another page that is not in scope for for our project and that we're not, if it's simply, you know the assumption was if the scholar or the publisher simply made it a link without trying to embed it in some way so that the user or reader could experience it in the flow of the work, then it's not part of the work in the in the same way. That we could absolutely examine reexamine that and I think that there's some utility in thinking through those that assumption more, you know, more rigorous way, but that's kind of the line that we drew. On the other hand, you know, a lot of these works publishers have gone through a lot of trouble to present video and data visualizations and, you know, other kinds of data in the context in context within the work. And the ones that we show today were relatively straightforward, but a lot of the works that we've been examining in the later sprints are quite complex. And so what it means to be presented in the context of a work is something that would be very different might be very difficult to to do in another way. And so it's very important to maintain that flow and that those relationships. Anyone else, anyone else have any. I'd like to if I could follow up. Yeah, I think that's, I think those are great points. And I think it's a conversation we have over and over again with our authors, because of our ability to host digital resources with the book. So we should have a misconception that that means we should have their entire research archive hosted on the platform, which is not what we're trying to do. We're trying to do something that more directly supports the argument that they're trying to make or illustrates it, you know, serves like the equivalent function of a chart or a table that you put in a book, except this is a real time media object, right. And so we don't want if your book is about a musician, we don't want their entire archive of all of their output, right. But the question of whether it's directly supporting the argument or not really comes down to an editorial choice the editor has to tell me whether they think it's vital to to the work or not and whether it makes a difference if it's embedded in the flow of the text or whether there should be a hyperlink. So it, while you know those choices are constrained by our ability to support them technologically they're not technological choices from, from my point of view they're editorial choices. Yeah, I agree I think the definition of what should be contained in the work for us is given to us by the publisher in some way and what is in the package. But it's interesting to think about the idea of how to express of compound work works and where the boundaries around those are generally, I think is a challenge. I mean, does the AI already work. But yeah what I found so I worked on the links between compound works and what we found is in some ways that what's contained in a compound work is up to the scholar and the moment they're creating them so it's sort of, it's a little bit of subjective. Thanks. That's really interesting and it's, it's helpful in situating where those choices are made and how they're signaled. Yeah, thanks. Thanks so much for that question cliff. And thanks everybody. Law is great panel.