 I think we're holding the time. Maybe I'll go ahead and get started. It's great to see everyone through the glare of the light. I'm Andrew Woods from Harvard University. And I'm concerned. I'm a little bit concerned, slightly concerned, that after lunch, some of you are going to slump over and fall asleep. But I'm more concerned to the point that given the explosion of data and giving the increasing demands on our limited resources and the criticality of making environmentally responsible decisions that we're at a point where we probably need to be reconsidering our approaches to long-term digital preservation. So for a moment, imagine a future. Dream with me. But don't close your eyes. Imagine a future. The sun is shining. Birds are twittering in the branches. And there's a consistent storage fabric across the enterprise with preservation sensibilities sustainably maintained that facilitates the flow of content across the enterprise in support of the research, teaching, and learning mission. So drawing from one of the guide posts that we at the library have in terms of strategic vision, we at the library see digital preservation as one of the greatest opportunities for collaboration between institutions that are inherently, bless you, committed to ensuring that information is not only accessible and usable today, but far into the distant future. So although it's not guaranteed, I feel strongly confident that using our community superpower of collaboration, we can achieve the vision. So my primary purpose today in standing up here and rattling on is to try to create a clear picture of what we're currently doing at Harvard Library and what we plan on doing so that you can see the entry points for collaboration. So I'm going to make one request here at the outset. I'll probably make more requests later on. But I'll make one request here at the outset that during the course of this session, you actively look for opportunities to see yourself in this picture and opportunities for collaboration. And we're talking about digital preservation as it so happens. Harvard has a digital repository service. We creatively call it the DRS. And personally, I like to view the DRS as the university's digital treasure chest. It has all kinds of amazing things coming out of it. Anyways, we're also feeling a whole range of pressures related to the DRS. Pressures around cost, provisioning storage, maintaining the software, adding new features to the preservation system. And we also are challenged to integrate the preservation system with the rest of the ecosystem, getting content in, basically feeding the backlog of content into preservation and getting it out to other services. And in all of this, as mentioned, we need to do in the context of making environmentally responsible decisions. And in terms of these pressures, that train's only picking up speed. So take a look at this graphic, which depicts the growth of the repository over the last decade or so. And the blue depicts the number of files in millions. And the orange depicts the size of the repository in terabytes. And what you can see is there's a roughly linear trend of growth in terms of files. But the discerning eye will notice that in terms of size, there's a hint of a trend of exponentiality. And given our current mass digitization projects with AV material and the planned ones within the next few years, the size line will jump by an order of magnitude. But don't despair. We are actively responding to some of these pressures. So recently, we completed a significant storage migration getting off of POSIX, an Iceland storage solution, moving towards, interestingly, an object storage solution using the S3 API. And when I say S3 API, it's completely reasonable that in your head, the word Amazon pops up. I'm not talking about Amazon. I'm talking about the general S3 API. And there are lots of different storage solutions that implement that API. But as a part of this migration, we've realized a number of benefits. One, a big one, is cost savings. So the DRS, the Repository Service, has a chargeback model for units within the university that deposit within the preservation system. And on average, on an annual basis, this migration has afforded, I think it's 48%, almost a 50% annual price decrease for depositors into the preservation system. Additionally, I'd like to point out that the on-demand extensibility has also brought down costs. We don't need to forecast out and pre-provision storage. We use storage as we use it. So we just pay for that common object storage benefits. And then the final point around flexibility. We're able to plug in different stores, optimize for different purposes throughout the process. And I think an interesting thing here is that the flexibility by which we can plug in different storage has enabled us to make decisions about which provider do we want to use. And specifically, we house two of our storage targets now in the Massachusetts Green High Performance Computing Center, which has a significantly lower environmental footprint because it's hydropowered. So all of this in total, I want to categorize it as a building block. So one of the building blocks in our solution that's building towards the vision, one of our building blocks is this notion of having an abstract API. Happens to be S3 in this case, but basically abstracting, and these are common principles, abstracting the software from the underlying storage using the S3 API. As a part of that migration, we went ahead and took the opportunity to restructure our content in alignment with the Oxford Common File system, Oxford Common File layout specification. And as tempting as it is, I'm not going to go into the details of the OCFL specification. And I appreciate the fact that Simeon gave a bit of an overview in yesterday evening's lightning talks. But what I will say is that by aligning with the OCFL, we inherit all of the benefits of the OCFL, which includes completeness. And when I say completeness, I mean, and this is an important point. There are a lot of important points. This is an important point. At the storage level, everything that constitutes our repository is persisted completely. We can completely rebuild our repository just from what's on disk or, in our case, what's in the object store. So all the data, all the metadata is completely there. So we can wipe away the databases. We can wipe away the caching. We can wipe away the application on top. We could rebuild the repository. OCFL gives us that completeness. It gives us robustness. And largely what we mean by that is the fixity information for the content within the repository is built into the structures of OCFL. We get versioning. We get parsability. It's a specification. Machines understand that specification, if you tell them to. And then also humans can take a look at the content that's there. And it is self-describing and intelligible. And then we also get storage diversity. And I think this is a great example of that. We're talking about we have migrated to OCFL in object storage. So the storage diversity that we get is we don't have to just write to a file system. We can also write to object storage. Also call out here the notion of decoupling the storage layout. Welcome, Nicholas. Decoupling the storage layout from the repository software. So what this enables, and this is pretty exciting also, what this enables is the ability to replace the software that sits on top of the storage. So there's a common pattern that I think we've just become used to and somehow accepted that every time you want to change the software on top of your preservation solution in this case, it's like, yeah, of course. We have to migrate our content. And as that content grows larger and larger, that becomes more and more ridiculous. So the stability of the OCFL specification allows us to be able to replace the software independently and leave the content in place. So a second building block is aligning with specified persistence layout, OCFL in this case. Now, as building blocks do, using those two building blocks, we can add on top of it with a third building block, which is policy-driven replication. And the granularity, the completeness, the file-based nature of OCFL allows us to surgically define policies related to the content in the repository. And at a very granular level, at a file level, replicate content according to policies that the curator set. So this also brings down cost. It's not like we are taking the entire corpus and copying it to three or five different storage locations. Based on the nature of the content that's being preserved, curators can define the policies and replication can happen based on those policies. Extending that same tooling, extending those same principles, not only are we replicating for preservation, but for content that has been flagged as being deliverable, we also can replicate to storage that also implements the S3 API, wasabi in this case. We can replicate for content that is flagged as deliverable to storage that is optimized for delivery to feed our access systems. If that's image delivery or AV content or even just files. So third building block, policy-driven replication. Now, what I will do is not talk about the details of our DRS objects in OCFL. This is an actual object in, or at least a subset, of an object in the DRS. But what I will point out, and I would like to plant the seeds, if you or members of your team, this is sort of opportunity alert, if you or members of your team are interested in working with specification, working with content modeling, there's a bit of a gap here that would be wonderful if we could collectively fill, which is defining a mechanism for specifying the meaning or the semantics or even the structure that all of our systems will do differently around the payload or the content of an OCFL object so that those objects can be bootstrapped by a higher level application. I'm not going to belabor it, but just if you're interested in this sort of specification, please get in touch. There's a gap here to be filled. So what I'd like to do is go ahead and tie up some ideas, make some observations explicit. One, for long-term digital preservation, it's important to have specified, non-proprietary, transparent persistence layout. Two, applications come and go. And I think what we're seeing is they come and go at an increasing rate. But it's the data that's the bedrock. And then three, as a community, we have a collective opportunity, and I might say responsibility, to standardize our systems' interactions with the preservation persistence. And what I mean by that is if we're working with vendors, we collectively let's influence the vendors to write to specified persistence as opposed to their own proprietary, idiosyncratic persistence layer that you can't really do anything with unless you have their software. Or if we're working with open-source solutions, let's influence those open-source solutions to write to specified persistence, or if we're building the code ourselves. So at this point, you might be saying, okay, I got it, three building blocks, we have this S3 abstraction, we have OCFL specification, we have policy-driven replication, like we're good. Yeah, we're good. But I hope you left room. Because that was the first course. Now, moving on, continuing to address some of the pressures that we're feeling, our beloved DRS is about to experience a full revision. So there's a three-year process here. And we're relatively early on in the first year. And I'll quote again from advancing open knowledge. We plan to modernize and rationalize our repository approach with a goal of disentangling preservation, asset, this is exactly what we're talking about, with the goal of disentangling preservation, asset management, and access while exploring opportunities for interoperability between systems. So we're early in the first year. This is the discovery phase. This is blue sky thinking. What's the art of the possible? What's the dreamiest repository that we can imagine? And then year two is, okay, let's bring it down to what's actually doable. And then year three is do it. And for your later review, this is the dream team that is actively working on making this happen. And so let me transition into, and this is not an architecture. This is just a conceptual diagram of how things stand currently. And it denotes some of our storage and service silos. So I think like a lot of us, there's the full spectrum of services that we support in our ecosystem. There are library and university systems that have their own transactional storage, if we're talking about records management or if we're talking about archival management or data repository or our open access, these types of systems. We also have infrastructure and tooling and support for ingesting content, transferring content, transforming content. And then what we've been talking about prior or in the early part here is the preservation aspect of our ecosystem. And we're making progress on having this notion of a consistent storage fabric that has preservation sensibilities built into it here where content that comes into the preservation system flows through this abstraction of S3, gets replicated according to archival policies and also gets replicated to storage as optimized for access. And then our delivery systems feed off of that fabric. But going further to the right in terms of research computing, there are storage clusters over there, there's research computing that is somewhat divorced from the rest of the ecosystem. So moving more towards the vision, so we just cross the line between what we're actually doing and what's possible based on the building blocks that we've put in place. So in terms of creating a storage fabric across the enterprise, we can extend the notions that apply to our access systems, our discovery systems. We could potentially extend based on policy that is set on content. We can replicate to storage that is accessible to research computing. Or if we're talking about a consistent fabric, maybe research computing can reference content that has been stored based on other policies, but basically connecting research computing, high performance computing into the fabric which can support researcher use cases or as was discussed yesterday evening with the AI ML session, can support library use cases of enabling richer discovery by extracting doing more comprehensive OCR, extracting transcripts, doing sentiment analysis, basically enabling rich and novel discovery through processing that we can more easily do on the content, also for data transfer. If that's tied into the same storage fabric, we can widen the pipe for getting content into the preservation services, generally speaking, and then maybe a little bit more ambitiously for library systems that we currently use to the degree that they could natively write to a specification, for example, OCFL, and if the fabric that they're writing to has inherent preservation capabilities in it, then that opens the door for possibilities of reconsidering how digital preservation is handled for other types of systems. So all of that said, the vision and the possibilities, I think are fascinating and they open the door to making our digital preservation infrastructure more environmentally responsible and more flexible. If you're interested in one, just following the progress of the project, please register your interest on the form that is behind this link, and maybe even more importantly, if you're interested in engaging in some of the conversation, the dialogue of solutioning some of this, please register that interest as well. And here we have contact information for the DRS Futures Project, for the OCFL initiative and my own details. And with that, I will say thank you for not falling asleep.