 All right. Thanks everyone for coming. I think we'll go ahead and get started Yeah, that's on so Thank you. Thanks everyone What What I will be doing I'm Andrew Woods from Dura space and along with Rosalind Metz from Emery and Simeon Warner from Cornell We will be introducing the Oxford Common File Layout initiative And hopefully in the process we will be conveying the value and importance of having a specified layout for preservation persistence so it's a possibly a little more technical than some of the Discussions that have been happening but I think it represents a real opportunity for long-term savings for institutions having a Sane organization of your digital content And and ultimately of providing long-term stability for a part of the mission that we as memory institutions are responsible for So When it comes to digital preservation There is maybe a little bit of a gray area In terms of are we doing enough? so let's say for example at a given institution For from a digital preservation perspective. I I do the things I should be doing right? I have regular backups. Maybe that's nightly. Maybe that's weekly. I also back up my database. I Have multiple copies of my content and that content is distributed in different geographic regions with different technology stacks and Either my my cloud provider or my file file system. It does 60 checking for me Great is that sufficient Maybe from a disaster recovery perspective from a digital preservation perspective Maybe there's an open question there, but taking it a step further Let's say that I am very diligent about implementing the the guidelines provided by the NDSA levels of preservation and I additionally make sure that my content has access controls That say no given person has right access to all of the content that I maintain logs somewhere of who is Acting on the content and when and what are they doing? I I collect all the necessary metadata the descriptive that the technical the preservation metadata I have documented Format types for my for my master copies So I I do all this and on top of that. I'm very interested in the the OAIS model and I and I try to have my architecture informed by that but all of that I still have certain concerns am I actually doing enough and And So talking a little bit about those concerns just on the continuum say for example I Come across one of my digital objects that seems to be broken or or worse yet Maybe one of one of our users communicates that one of the objects that they were expecting is broken The question is what what recourse do I have? What can I do about it? Do I have to restore from the state of my My preservation repository from last night or from a week ago Can I actually look on disk and see what might be wrong? Is is there anything for me actually to look at there or? Moving down the continuum Let's say that we've decided not to use a particular application anymore for our preservation repository we want to move to a different application and That well, I guess the question is does that mean I need to migrate my 7.3 Petabytes of data because I'm using a new application Do I need to move and reshuffle my content just because we're making an application choice? And well, I have to do that all over again the next time we make a new application choice and Maybe further down the continuum and I would like to think or I would hope that this is a worst-case scenario where maybe you can envision sort of the the smell of burning tires and Looking around what was buildings is sort of a pile of concrete and rebar basically Just surrounded by a smoldering wasteland, but often the distance you see An arm bursting through the rubble and I have the hard drive and Then this is this is the hard drive that contains the content of my preservation repository and the question is if if I have that hard drive is it in and of itself Complete and meaningful. Can I can I plug it into a? POSIX server when with basic tools actually inspected and and rebuild civilization? It is enough information there in an intelligible way Or do I need some some special application that is just not available? so What what this suggests is that there is An element that's missing and and I would like to think that we are just at the phase now where You that this is becoming more crystal clear and I would like to think that this is actually relatively self-evident so there there is an opportunity or a need for having a simple and Maybe like self-evidently simple non-proprietary specified open standards approach to the layout of our preservation persistence and and ideally this this specified layout represents the bedrock of How how our content? Lives and and then all of the applications which are actually much more transient than our content all the applications can be built Against that specification and then next year we throw that away because there's something shinier that is then built against that specification so so the this represents the Yeah, the bedrock for our content and it also opens up the door for working within a community Which is something we've heard about working working within a community that is Building tools that also work against that that same content So Giving a little bit of background that these same types of conversations arose in the fall of 2017 at an unrelated meeting at Oxford where many digital preservationists were talking about well this among other things and one of the outcomes of that meeting was a an Articulation of this the state of this issue by Andrew Hankinson at at Oxford and That that Document was posted to the PASIG mailing list and after some back and forth on the PASIG mailing list a couple of months later The first community meeting which was recorded so you can look forward to that if you weren't there was Well, there was this community meeting that involved over 30 institutions where the conversation was furthered use cases were gathered and Ultimately more or less a year later after monthly recorded community meetings and weekly editorial meetings We have produced two iterations of a specification for the Oxford Common File layout and I will invite Rosalind to describe those So the Oxford Common File layout is actually two documents if you go to OCFL dot IO you'll see that we have both a specification and Implementation notes the specification itself describe objects at rest So what one expects to see should they just come upon an object laid out according to OCFL? This specification discusses primarily the objects that includes the structure of the objects and the inventory which Inventories the objects one thing to note that OCFL does not make any Statements about what should be contained within your objects only that a an object should include both the file and the metadata Whatever that file and metadata is at your own institution. We do not try to push a Common object model on to institutions The specification also discusses the storage route and how objects would be laid out together and We also provide in the specification examples to help illustrate the use of OCFL In the implementation notes, we outline best practices for the objects in motion And provide advice for implementing this specification This includes guidance on digital preservation key recommendations for keeping within the spirit of OCFL we had Many many conversations and have become very close because of them around some of these conversations A storage that includes how content should be stored and how Objects should be handled as they move back and forth within the storage and then client behavior So expectations for what you might see if you look at a Client that would lay OCFL out within a storage infrastructure So some of the benefits of OCFL are completeness parsability robustness versioning and Storage diversity and I'll go a little bit more in depth into each of these So first completeness The complete intellectual object is stored together with its metadata as I mentioned again We do not provide or prescribe What that metadata is or even what those files are we leave those up to the institution? it does fall in line with Existing standards like TDR the NDSA levels of preservation OIS But it's important to note that these specifications while they Provide the foundation of much of what we do for repositories do lack Exact prescriptive methods for doing this and this is something that OCFL makes up for It also allows for the ease of mapping from one system to another and that's something that you'll see As we talk about all of those benefits One benefit to is parsability So in disaster recovery situations, we want to make sure that humans Can be able to read and understand the the content itself And that can be Andrew's hand coming up from the rubble, but also you have Failure of your repository your repository ceases to exist for whatever reason So we wanted to make sure that humans could read it Additionally, we wanted to make sure that machines could read it And that would allow for again disaster recovery situations You can place a simple machine at simple application on top and read through an OCFL storage route Again without a lot of effort Conceivably The benefits of OCFL are also its robustness So we have strong fixity built into the specification And talk about it within the implementation notes as well One of the things that you'll see on I believe the next slide is we do keep Hashes To address the content within OCFL Those can be used for fixity checking, but they can also be used Just to identify files and we'll talk a little bit more about how storage This can also help reduce your storage Capacity or how much storage you need Content can be easily validated using the inventory And this was something specifically built in we wanted to make sure every single time you're making changes that you can validate your content And objects can be completely self-contained so you can keep an object all in one place Or if you want you may be able to reference other places as well Versioning I would say is probably the piece of this specification that most people find interesting And so changes to the object are tracked over time We use forward Delta to reduce the amount of content stored So this is something I referenced in their previous slide So using forward Delta means that multiple copies of the file do not need to be stored Over and over again a lot of folks who do use versioning often will version the entire object and will keep files Duplicates of files from one version to the the next and this is something that OCFL nods to Moab one of its predecessors That was developed at Stanford for Previous versions of the objects can be reconstructed using the inventory Dot JSON file so you can go into the inventory file and actually reconstruct a Previous version of the object should you need to again? This was something that Moab did very well, and we Tried to keep in the spirit of that by making some improvements on Moab You'll see off to the side copy of the inventory dot JSON file and You'll notice truncated Hashes we used Shaw 512 and they're really long so hence the truncating for Examples One of the last Benefits is storage diversity So this is designed to work with various storage infrastructures Including object storage which are prevalent in cloud offerings Multiple institutions have a need to use That were part of the editorial group, but also part of the community have a need to you use storage That is offered by Amazon in particular It will also support the conventional file system metaphor So you'll see that conventional file system metaphor used throughout a lot of that is really because humans think Or most of the humans in our industry think that way So we we kept Kept that sort of metaphor as we talked And as I mentioned and once again It can be implemented to ensure deduplication of content this could over Overall lower your storage costs, especially if you were in a cloud environment So with that I'm going to hand it over to Simeon who's going to talk about the status of OCFL moving forward Yes, my pleasure to say a few words about where we are and where we're going So Andrew noted earlier that the first draft of a specification and some implementation notes was released in October of last year Since then we've had some really great community feedback and we're closing in on a beta you can track that if you want through our github repository and the issues tagged as to be resolved before beta and We hope to get there in the next couple of months partly because we have an open repositories presentation And it would be great to have it out for for that So one of the ideas of a beta release is to allow a period of stability with something that we think is a plausible candidate and Have that stability allow for implementation and testing work people have already started playing with the specifications as they are So perhaps the most robust work so far is from John Hopkins There's a client in the go language, which is pretty robust Also in the go language There's work from Oxford associated with the Oxford Research Archive and they're focusing on both Creating a client and an underlying API that can be reused There's some experimental work from UW Madison using Java The specification uses examples and there are some work on some test fixtures based on a reference Implementation we've been working on at Cornell Stanford have Aspirations of doing work within their Stanford digital repository in Ruby perhaps this summer Or at least Julian's hopeful maybe has suede job Tom And The Fedora community has been looking at OCFL as the underpinning for persistence in archival systems as part of the Fedora 6 Good to say that but wait there's more So I had I had designs on giving you lots of crazy pictures of what might be possible But my more level-headed companions said no simian We'll let you put up a few words and then you have to speak a little bit about what might happen in the future so We've purposefully really quite controlled the scope of version one to a set of features Which we were pretty certain the majority of the people engaged in our community wanted and needed and that we felt We could specify with reasonable certainty that we weren't doing it a bad way However, we have an interesting set of use cases in one of the get hundred repositories under OCFL and these cover things like What about a packaged up object? So if you have objects that have many small files They are both inefficient to store and hard to fix as you check if they're all Individuals so what about a taro zip file for a whole object? What rules would be necessary to understand the presence of such an object in a storage route with other sorts of objects? How would such an object be updated? Another option might be I want to package up each versions content This has the same benefit of the previous option and that you don't have lots and lots of small files But it also has the benefit of Maintaining the immutability of any individual version once it's written But once again, what are the rules around that? How would that complexify access checking and updates? What if an OCFL object could be stored with Different versions in different object routes or what if some of the content of an OCFL object were referenced in other systems? How could we handle that in a way that would still give some sensible notions of validation completeness? replication So one of the things about OCFL is it's this blindingly simple arrangement of files and Collections of files in the structure to have a set of objects So obviously you can do replication by simple file system operations But as soon as you start doing this you start saying well, how do I know my copies are in sync? Am I doing it efficiently? Am I copying just the new things? So are there ways we should think about? additionally specified Aspects that would help a system maintain replicas and help have shared tooling around that OCFL is sufficiently close to some existing approaches noticeably notably Moab that in place migration might be a possibility and You can observe that there is no way to instantly migrate 300 terabytes of content In fact any way to migrate 300 terabytes of content is going to be kind of painful Are there ways which we could help manage a gradual migration? And support ongoing understanding of good state during that process Up until a few years ago. Shah one was a good checksum. It's already not a good checksum Shah 512 is currently a good checksum Even with standard advances in computing that will no longer be the case some years hence if we suddenly get good quantum computers All bets are off. So there'll be changes We have an extensible mechanism built in but how from a specification point of view would we manage? What's the set of checksums that common tooling to should support and then there's the the open slot for your community needs here? The idea is that we're building a specification and hopefully a community that will build tooling around a set of shared needs so far we've had a certain number of people engage with us and Perhaps there are other opportunities that we don't yet know So now we've convinced you that this is potentially interesting stuff What should you do? Well, if you want to watch the first place to watch is the community list There's a link there to the groups obviously the shot slides will be available That's pretty low traffic it announces meetings it shares notes from meetings and that's a good way to watch There's also the OCFL the IO website which has the current releases and drafts and links into the github If you want to get involved a little more heavily Andrew already mentioned the monthly calls which are announced on that list He also noted that they're recorded. So if you happen to miss one, it's easy to catch up We have a fairly low volume slack channel Everything that we're doing is within the OCFL organization on github We'd certainly love to hear of new use cases or comments on existing ones and then of course there's a current draft Implementation notes. We'd love review Discussion issues and if you're really gung-ho implement it. I Think with that we have time for questions