 as soon as you're ready. Thank you, Mike. This is Kristen Leis at Heritage Preservation in Washington, DC. It's almost 90 degrees here today. And we've been hearing weather reports from across the country and around the world in the hello box. Thank you for that. I want to just very quickly do an introduction. We have a lot to cover today. And I don't want to take away from that. But again, this is caring for digital materials, preventing a digital dark age. And we want to thank Learning Times for producing this for us and especially to the Institute of Museum and Library Services for making funding possible for these courses. We've had tremendous interest in this course. We've got over 200 people logged in already. And we've had almost 400 on the last few webinars. So welcome, everyone, to that. This is our fourth class today in this Care of Digital Materials course. We have Jefferson Bailey about practicing safe archiving, backup, copies, and what can go wrong. And then we will have our last class on Monday, April 15. And we look forward to that. If you haven't yet done so, please do look at the course web page. We put links to the PowerPoint slides, where you can find homework if you've missed an assignment and you want to link to it. You'll find that there. We've got lots of resources that our speakers have pulled together for you. So pretty much anything in their PowerPoint, they've already got linked to on the site. And further information, we've had great conversation as we've been going along. And we're going through your questions and comments. And all of those will be getting up there. So keep checking back on this page. And hopefully you're getting our emails with the recordings after each webinar is broadcast. If you are working towards our certificate or digital credential, just remember you need to register, turn in your permission form, watch each webinar on the course, either live or on the recording, and complete all five homework assignments. And make sure that's all done before Monday, April 22. Danielle will probably on Monday go for today's assignment and the assignment she made yesterday. Just give you a little feedback on it. But hopefully by the end of today's session, we can also talk about what the homework assignment is for this session. And we welcome you to join the community. The Connecting to Collections online community, if you become a member, then you can join the discussion boards and ask questions of your peers. We're about six people off of hitting our 3,000 member mark. So if you can be that 3,000 member, we do have a special prize that we're lining up for that. So if you are a member, we invite to ask your colleagues to join you on the online community. And then, as always, if you have any questions, please contact Heritage Preservation and we will do our best to help. So I want to introduce today's speaker, Jefferson Bailey. And he is working with the Metropolitan New York Library Council, or Metro, on their strategic initiatives, which include program development, research and publications, new technologies, member services, and other events and programs. Before 2012, when he joined Metro, he worked on digital preservation programs for the National Digital Information Infrastructure and Preservation Program, also known as NDIP by some, and the Digital Preservation Outreach and Education Program, some people call that DPO. And that's both at the Library of Congress. He received his Master's in Library Information Science with a specialty in archival studies from the University of Pittsburgh and his undergraduate degrees from Overland College. And I do also want to thank Danielle Plummer for being with us today. She's our course coordinator. And she will also be assisting Jefferson in answering your questions in the chat box. And when he takes his break for questions, she'll be helping with that as well. So I'm going to close my PowerPoint, bring up Jefferson's, and then close our hello box. And we'll start our moderated chat. And as Mike mentioned, you will notice that if you post anything there, you'll see it twice, once when you've let us know and once when we publish it to the group. So we'll move that away. And if you had any tech issues that you mentioned in the hello box, we'll make sure to deal with that. So I'm going to bring you back to the beginning here. Jefferson, this slide one? Yep. OK, great. I'll turn it over to you. Thank you. OK, great. Thanks, Kristen. And I also want to say thanks to Danielle who will be helping out today, and say thanks, of course, to Heritage Preservation, who's putting on this great series and IMLS, who is, of course, paying for it. So what I am talking about today is practice safe archiving, backups, copies, and what can go wrong. So we'll be talking about digital materials, obviously digital preservation, as well as how it ties in to physical media, which is, of course, how we store all our digital information, what can go wrong, and what actions we can take to address those issues. So I'm just going to move ahead. So this is sort of to talk about the whole series at large, the caring for digital materials goals. Participants have a better understanding of the inherent fragility of digital objects. We'll definitely be talking about that today. Participants will acquire information to help them select preservation formats, metadata, and backup systems for digital objects. And some of them, previous presenters in the series talked about that, and we'll be talking about backup more today, but of course, all of these things will be touched upon. And the third is participants will be able to identify one or more actions that can be taken to improve their institution's digital preservation efforts. We'll definitely be talking about that today. So next slide. And there you see the previous ones. So those will be available online, and we'll be putting this one up. And Liz Bischoff and Tom Clareson will be talking next week about digital preservation and collaboration. So our outline for today's session is basically going to be in three parts. So the first part, I'll talk about physical media and digital information. And this will talk about some of the challenges, as well as the actions that can be taken to preserve digital information that lives on physical media. Part two, we'll be talking about backup and storage. And then part three, we're going to talk about the levels of digital preservation project. And this is a project that got started when I was at Library of Congress out of the NDSA, which is the National Digital Stewardship Alliance. And that's a consortium that LC oversees of people interested in digital preservation. And the goals of the Levels of Digital Preservation Project, and I'll talk about it more when we come to that part of the presentation. But it's generally intended to be an accessible guide for institutions of any type to try to undertake digital preservation practices and actions. Next slide. So I just want to start off with a couple sort of high level slides, but I'm going to try to keep all of this very accessible. But I think throughout the whole series of these presentations, we always need to keep in mind what is a digital object, what is digital information, and the many different forms that it takes, and the many different ways that we interact with it. So you'll see some pictures here. We obviously have floppies and drive. Digital information is dependent on physical objects and storage. We see some ones and zeros. That's a sort of familiar way that we think about digital information as being binary is what that's referred to. If you open some images say a JPEG in a text editor, which is the third little image, it basically looks like gibberish because a text editor doesn't know what to do with an image. And then obviously we have images of great things like baby walrus. And that can be represented in software. It can be represented in code. So I mean, the point of this slide is just sort of to remember that a digital object and preserving that digital object requires us to interact with all of these different layers. And so the physicality and what we'll be talking about today is that digital information is, of course, stored on floppies and it's stored on hard drives and our computers and servers somewhere. So they're all very dependent on physical material and this is actually a very large microscope shot of the surface of a hard drive. And you can see these are actually what we call bits. A bit is sort of the component piece of a digital object. It's a 1 or a 0. And you can see trenches and valleys. And so the peaks are basically 1s and the valleys are 0s. And that's how digital information is read by your computer and it goes through many different translational states until it comes up essentially on your screen to an object that you can look at and understand. And so Clear and NARA sort of got together a number of years ago to think about these same issues that I'm talking about. And they came up with this ontology of digital objects. And so ontology basically just means what are the parts of a digital object. And this sounds a little fancy and high concept, but it's important to remember that when you're doing digital preservation you are addressing each of these different parts of an object. So you have the physical object, which is, of course, the floppy drive or hard drive. The logical object, which would be code and software and applications and programs that help you understand what is on that physical object. And then you have a conceptual object. And that is, of course, an image that you can look at or a document that you can read. And then the third, and I've sort of added this, is the conditional object. And to do digital preservation, and this will come up throughout the presentation, really need to stay mindful of the fact that each institution has its own different level of resources, levels of expertise, level of IT support. And what we try to do as digital preservationists is make sure that we can do the best that we can with what we have. But what we have will always be very different. And that really drives a lot of the digital preservation conversation and best practices. So we'll start with Sesh with part one and talk about what can go wrong with the physical object. And so these are the disk and the drives, as I mentioned. And there's a number of things that can go wrong with them, obsolescence. So that just means something is obsolete. So it's no longer being produced. Or the machinery required to access it is no longer being produced. So this is very easy to think about with a five and a quarter floppy disk. The old big floppy disk. Obviously, those are no longer built into computers. They're no longer in our laptops. So if you have a collection on five and a quarter floppy disk, it's very difficult for you to get to that information. Access, so obviously physical items can degrade over time, just like analog or paper can fall apart. Digital media has even shorter lifespan than most physical documents. Appraisal, obviously it's very difficult to understand what is on a piece of physical material without accessing it. And then authenticity is just the idea that you need to ensure that what is on that media stays the same even when it comes off of that medium. And this is true for paper, too. I mean, Xerox copies, microfiche, these are all attempts to preserve authenticity of content regardless of what format it's on. And that's true for digital materials just as well. And so I want to have a quick poll when we talk about physical formats and just get a sense of what people either are collecting or what they have within their institutions. And I've sort of added a range of options here. A number of different types of floppy disks, zip and jazz disks, which are both zip disk essentially. And then SD cards, which are like the little things you put in your computer. And then tapes. Optical is a very popular one, so CDRs and DVDs. And then, of course, external and internal hard drives. We're all familiar with. And then online servers. So when you save it to your network drive, that is actually just another drive that happens to be attached to a network. So wow, a lot. This is a great diversity. A lot of three and a half, a lot of tape. We all probably have VHS and audio tape. And definitely a lot of optical, which we were certainly expecting. And a very large number of hard drives. So yeah, we'll give people another minute. But it's great to see such a diversity, especially in some of the older formats, like five and a quarter, three and a half, and zips. Because those really are much less accessible. OK, that looks great, Kristen. If you want to close that, thanks everybody for filling that out. A lot of tapes. That's surprising, especially for AV material. So moving on, this is just looking at some of the different media types in their longevity. So just a quick note to talk about how difficult it is to actually quantify the longevity of a piece of media. As I'm sure most people are familiar, it's very dependent on environmental factors, how it's stored, dependent on how often it's accessed and used, the type of equipment that it plays on. I mean, tape, which a lot of people obviously had, does not survive well with repeated replay, just because it's dependent on tension. And tension is obviously mostly to rip things apart, of course, right? So there's just some numbers here that people can look through to try to at least get an understanding of when they need to migrate it off of these physical carriers. And optical is really the big one, because there was a lot of digitization done. And optical was often used as storage in the 90s. And so a lot of information put on CDRs and DVDs from those projects of the 90s are really starting to reach the end of their longevity. So migrating off of optical is something I see being a very big issue in the next couple of years. And so I've also added two links to excellent reports, and these are also in the resources documents that are part of the broadcast. And so you can look at those and they give you a little better sense of handling and storage. And so what are some of the actions to get things off of physical objects? Vintage drives and machines, these are not too common, but I think we'll start seeing at least collaborative efforts to maintain and house old computer systems and machinery and replay devices as far as allowing access to those sorts of medium. And then a couple we wanted to focus on in this presentation and these are all tied together, controller cards and write blockers. And so I'll talk about these for just a minute and I have two pictures there in the middle of them. A controller card is essentially a device that allows your current computer to access an old drive. So controller cards are mostly used in order to access three and a half and five and a quarter floppy drives, which you can still buy on eBay for about 50 bucks for a five and a quarter drive. But obviously there's no way to plug it into your computer. So these are basically little interfaces that have a USB on one side and then a controller to the floppy drive on the other. And a write blocker is something that prohibits you when you use that controller card from actually writing to the device that you are trying to get access to. So it is what it says it is, it blocks writing to the old drive. And that goes back to the authenticity point that we were talking about earlier. So forensic software, we'll talk about this in a couple of slides. These are just software tools to help you do some of these activities. A photo station is interesting. When people get donated physical media, they often obviously want to have a picture of it because people write on the outside of floppies is a very common thing. And so there's contextual information that can be gained there. So taking picture of storage wouldn't necessarily, obviously optical disc is not usually the case, but wouldn't necessarily capturing an image of the device is often important. And then a FRED is a very, it's a forensic recovery of evidence device. And it basically combines all these toolkits into one big tower that you see at the bottom. So these are all methods of trying to acquire data off of old physical objects. And then I talk about backline here and we'll talk about this more in the backup and storage section. But storage, transfer and infrastructure will be things that we'll talk about more. So a couple of others is the BIC Curator project. And this is a project to develop some of the forensic software tools that we talked about. So it is very emerging and not really usable in a everyday level at this point, but it's worth keeping an eye on that project. And what it does is essentially one of the stages is help create disk image. So a disk image is a exact copy of any piece of media and that can be CDR or DVD. It can also be a floppy drive or it could be an entire computer. And so it just means that you are copying everything that's on there. And it's a good preservation method because you're not actually trying to access the files. You're actually just making a full complete copy and you can sort of put it away and decide what to do with it later. Virus checking is something that will be done on anything and it's probably something your IT department is already doing in many cases. There are free tools for these and they're listed in the resources document and they're good to use. And for people that are more interested in this sort of larger area of digital forensics is what it's called. We've added a couple of links to some great reports that are very useful as far as wrapping your head around. How do we get things off of the old media? And so those are the challenges of the physical object. Access, obviously metadata. Danielle talked about yesterday, I won't go into too much detail. But some key principles are an inventory that's most collecting institutions are already doing that. But it's important to also do four pieces of physical media so that you know what you have, how old it is when you might need to migrate off of it. Unique universal identifiers, I think this just means an identifier for an object that can be a collection or it can be a floppy disk and that identifier will stay the same through time. This also applies to specific file names. So an identifier for a file should remain the same regardless of where that file is stored. So if it's stored on your laptop or if it's stored on the network or if it's stored on a DVD, the identifier should always be consistent. Locate, this sort of goes back to the inventory part. You know what you have and where it lives and this is applicable both within your physical institution but also within your digital environment. So I've called that cyber face and meet space are sort of the terms for that. One is online and one is on site. So the location of both of those important, descriptive information, we're all probably familiar with that. And then the photographing objects I talked about a good bit. So appraisal and authenticity, these are also key concepts when talking about the accessioning or preservation of physical media that has the digital information. You know, work with your donors. If you're working in an institution that is accessioning things from donors, describe what you have when it comes in, back it up and just remember to align what you're acquiring or preserving with what your collection policy is and what your institutional abilities are. And nobody can save everything. This is a point we all often try to make in digital preservation. People feel very overwhelmed by the volume of the information they have. And the key point is to try to do as best you can in some standard best practices, and we'll talk about those in the levels of preservation section with the resources that are available to you. So I'll sort of stop there to see if there are any questions. And so feel free to put them in the Q&A box and we're sort of gonna try to combine some of them so that I can address them. Sure, we don't have too many questions yet. We have a lot of people who are talking about the different diverse types of materials they have in their collections. One of the common sorts of others is LP's phonograph, glass slides, wax cylinders in some cases. A few people have laser discs. So it's really a lot of different kinds of things. Some of those are not digital. But nonetheless, it's good to know. Yeah, well, I think when you're digitizing a lot of things like certainly glass plates and print photographs and LP's, after you're digitizing them, they are going, that digital object is gonna go to some form of media. So those aren't carriers of digital information, which is more of what we were focusing on, but they will play a role in digital preservation, especially as people digitize. Great. So one of the questions, and there are a couple of different things that I'm saying related to this, would be about media that's been advertised as archival. And I'm specifically talking about gold CDs here, and what it's expected lifetime is. And then somebody asked if anyone had any experience with the M disc, which is another media that's being sold as archival. So what are your thoughts on that? Yeah, certainly archival gold standard is basically a slightly longer longevity. Now, whether it's worth the added cost, people will have different opinions about these things. You know, the thing to remember with any piece of physical media is that it is ephemeral. So it's, I mean, an archival gold CD might last a little bit longer than a regular CD, but you will still have to get your information off of that and onto a different media at some point in the future. So, you know, I tend to shy away from recommending archival gold. It does have a longer longevity, but it also has a higher price, and those are kind of institutional decisions, and they need to be weighed against each other. So I think, you know, for some institutions, it'll probably be the right decision, and you probably will store it long past its actual longevity, you know, which is okay. And for others, you know, they might be migrating to an online storage environment sooner rather than later, in which case the cost is not necessarily justified. MDISC I'm a little less familiar with. I've heard of it, and I know it has a higher storage capacity. But I think with any optical media, the longevity is not that much greater. Yeah. And someone is just respond that Utah State Archives is adopting it, and we'll find out if more people start using it, how that one does. And I guess one thing I always to remember is that it still has to play on something. So even if the disc itself lasts longer, well, you still, you know, obviously we don't have floppy drives in our computers anymore, and it's very easy to think that at some point in the future, and this is obviously looking very far down the line, but that's something you need to do in digital preservation. It's easy to think that access to any sort of media has its own lifespan, right? Moving on, we had a couple of questions about Bit Curator and forensic workstations in general. So briefly, as someone wanted to know how a Bit Curator image is different from just a normal backup. It is actually its own file, and disc images have their own file format. So I guess you could think of it in a way, and this is, I shouldn't be saying this, but as a zip file, so it's, I mean, it's an entire piece of information in compacted into one disc image. So that'll be a whole floppy drive. It could be an entire two terabyte drive. So the volume doesn't matter. It's just making, essentially, it's making an exact copy of the whole thing. So a backup would just be a copy. It'd be the entire file system, not necessarily packaged into one quote-unquote object. Right, so one way to think of it is the normal backups think about files, and the Bit Curator is actually copying the media, the bits on the media. Yeah, that's a good way to put it. Okay, there's some other things that we'll respond to after this session, but we'll just keep going for now, and if we have time at the end, we'll come back or we'll answer these in writing after the session. Okay, so moving on is, I think Danielle talked about this if people were on her presentation yesterday, but this is a fixity is a core concept of digital preservation, and what fixity means is means that your files, your digital object has not changed, so it is fixed, and its information is consistent through time, through when you got it, through when you last made a change to it, if there are changes, and this is very important in preservation because when an object is sitting on a server or on your laptop, there's no visual way to tell necessarily that it has changed. Now sometimes things won't open or they might look funny, but there can be changes to a digital object that affected through time that are not visible to the human eye. And so fixity is essentially, as I put here, a digital fingerprint of a file, and what it is is an algorithm, which is just a long computer program that computes an identifier, and so this will be a sort of long number, it's a little difficult to view on the slide, but it'll be numbers and letters of, you know, 24, 16, or 38 characters. And so what you do, a software program will run this algorithm on a file, or it can run it on a whole computer and generate a whole list of checksums, and so the algorithm produces this number and it's often referred to as a checksum, sometimes referred to as a hash, sometimes referred to as a message digest, these all mean the same thing, and it's basically just this digital fingerprint, and so the utility of it is that when your object has changed or become corrupted, this algorithm will produce a different number, right? So the number stays the same only as long as your digital item stays the same, and so you can run these algorithms and then they will produce this checksum and then you can run that checksum against a previously produced one, like one you did a month ago or a year ago, and if they're the same, then you know that your object is still the same, and if they're different, then you know that it is corrupted or somehow changed, and you can do this without actually accessing the item itself. So this process of auditing them is often called a fixity check or a fixity audit, and I mean you can sort of understand clearly why it is essential to digital preservation, it's because it's the preservation of a digital object and you need a sort of key way to understand whether something has changed. And so our first homework exercise will sort of be focused on computing a checksum and then checking it, but we also wanted to run this poll here early in the presentation just to get a sense of how many people were actually generating these checksums, and this is called fixity information, you know, there's lots of terms for it, but it's essentially an identifier at the point of ingest or creation of a digital file. So people don't know, sometimes the IT department will do this, but you know it is usually a archivist or a collections librarian or a curator or a person in charge of managing digital assets. So yeah, not bad numbers. So it's good to see that, you know, we have about eight, eight percent or so are actually doing this, and if you're generating it then you're most likely also auditing it. There are some tools that we listed in the resources as well as some documents that you can read about it. It's not an overly difficult process for people to generate this information, and how periodically you run these fixity checks or fixity audits is an institutional decision, but certainly generating fixity information is a key component of preserving digital items, because you will not always know when something goes wrong, so. All right, we can close the poll, it looks good. And so here are just some screenshots, and I'll go through this quickly, and this is basically what it looks like. You can, you know, it's a little hard to see, but you can see these are just either software systems or machine environments running through a whole file system and they generate these checksums, and it's just a long string of digits, and it creates a report, and at a late report you can run the software on the exact same set of items that'll create a new report and compare them and tell you when something has changed. And so, you know, what this guards against fixity is of course bit rot, and you see in this image this is something that through being uploaded online has suffered bit rot, and transferring digital files often results in something happening to the file that can affect it in this manner, and sometimes it's not quite as apparent as this, and sometimes it's entirely invisible, and that's sort of the important point of fixity. So I've listed a couple of software tools that are free and can be downloaded to play around with, and then Baggar is also a library of Congress tool for transferring data, and it also generates checksums when using it, and there are some guides and videos that we link to in the resources that are worth looking at if you're interested. And then I talk about fixity checking and audit there and some of those bullets which we already discussed. And so I'm gonna run through this part a little briefly, and it's mostly about the logical object. And so if we think about that, that essentially means, you know, the software, the application, the computer code that can take that physical media and turn it into something like an image or a text that can be read. So there's a sort of middle stage there, and there's a lot that can go wrong there just as there is with physical media. Obsolescence we all know about old file formats, and you will try to open something on your computer, and it doesn't open, because the system doesn't know what it is, and this is, you know, a very complicated part of digital preservation, which is not why we're gonna kind of skim through it here, but it's just something to keep in mind as you try to establish a preservation program within your institution. So access, I mentioned appraisal. Obviously if you are creating or taking from donors digital materials, what format it's on will be very important to whether you can access it. And authenticity is very similar to the authenticity issue that we talked about with physical medium. And so, you know, the challenges are that recognizing and processing all these many different types of files is very difficult and complicated. And that a format tells the computer essentially what to do in order to access that object, but it's very difficult for us to preserve that information. So, you know, I'll skip through these, and I guess the key point to make here is just to try to remain cognizant that there are open file formats like TIFF, and PDF-A is a preservation format that is a PDF. So saving things in open file formats, that means that the community knows what the programming is that allows it to be opened. And if the community knows it, then it will be obvious to people throughout time. Whereas, you know, specific pieces of vendor commercial software, we don't always know what the code is because that's where they make their money. So the action is just to remain cognizant of open formats. And so then finally, and this will be the last part of this section, is the conditional object. I talked about this a little bit at the beginning, and that's just remembering that for every institution, there's different levels of resources and knowledge and institutional knowledge and institutional practices that can be dedicated towards digital preservation. And so sort of our job as preservationists or curators are, you know, nonprofit workers working with cultural heritage material or other digital information is just to do what we can with our institutional resources in accordance to community accepted standards and best practices. And so that's the conditional object. And here are some frameworks of digital preservation systems. Now, these are very complex and high level, but we've sort of done some exercises and some of the resources so that you can at least look through them to try to get a sense of what people, you know, the large frameworks that big institutions are using, have a lot of value as far as telling you what they're doing in a very simple way. And I know these diagrams, of course, don't look simple at all. But if you actually read through some of the requirements when you look at the documentation for these frameworks, you'll pull out little pieces that can help you implement policies within your institution, regardless of, you know, whether you're actually going to meet all the requirements of one of these frameworks. And I called this a series of nows because I think that's a nice phrase for thinking about digital preservation and backups and storage because as we talked about with some of the, in the previous slides, this is something that happens through time, right? And so things will always become obsolescent or will change formats and media. And what we're trying to do, and this ties into the institutional piece, is to try to do the best we can right now. And that sort of now will always be changing and it will always be never ending, essentially. But digital preservation is all about access into the future. And so nobody can tell the future. So what we try to do is what we can right now. So that sort of finishes part one. So, Danielle, if you wanna try to organize some questions, we can maybe talk about it. We'll get a little more specific on back up some storage next. So, you know, you can save those for that question, that section. Yeah, so I think people were talking and there was some information going back and forth in chat, some good advice that came out of this. But in general, people were asking about procedures for fixity. And someone phrased it as, so if I have two copies of it and one copy goes bad, the fixity changes, then I get rid of the old one or the bad one and I replace it with the good one. Is that kind of how you would summarize it or do you have other thoughts about how fixity works in a practical sense? Yeah, and so we'll talk about this a little in back up in storage, but obviously multiple copies is a key component of digital preservation, right? I mean, multiple copies, multiple places. So, yes, when you're running fixity check, you can run them separately on different sets of items, but as far as a workflow, it will be at the front of the workflow and then when corruption is detected, you would replace it with the authentic copy. And Catherine then followed up with, is there any way to reverse bit rot in a corrupted file? There is not. Okay. Yeah, basically no, I mean, you would have to, if you have an original copy you could, but if you have original copy, you would just be replacing it. You wouldn't actually repair the other one because you have, you know, I mean, the great thing about digital materials, they're easy to copy and replicate. So once you detect corruption, you would generally delete that file and replace it with a, yeah. So this is one of the reasons we talk about digital objects being fragile, is that we can't do repair for the most part. Right, yes. And that's why fixity is sort of at the front of a workflow or ingest or accessioning, however you wanna say is so important, right? Because I mean, as soon as you get them, you will be wanting, if you can, to generate fixity information so that you can then track it and authenticate it while the object is in storage or if it gets passed around to be described or if it goes into different access systems or OPACs or collection management systems. So yeah, incorporating fixity and again, it's not too overwhelming a process or procedure. And if you have an IT department, they will be familiar with this concept and with some of these tools. So certainly worth talking to them. Okay, well, I think that then we had some questions about some specific tools and question about checksums on Macintoshes. People were referring to specific tools that they've used or have information about. You wanna talk about some of the ones you briefly put in the resources? Sure, I put one in the resources both for Windows and for Mac. You know, the one thing about fixity, I'll note is they're not a lot of tools with graphical interfaces. It's usually, it's something that's built into every operating system as far as a command line function. And that means where you're actually entering the commands into a terminal window is what it's called on Mac. And it's the command prompt on Windows machines. But I did put two pieces of software on there that are both free and that are both very easy to use. And understand I think MD5 is the one for Mac and MD5 Summer is the one for Windows. And so those can be used by people on their desktops to generate and to audit. They both have audit functions so you can run, you know, they'll create a file once they run a fixity check. So those are good. And then there are other ones out there as far as command line use. And then Nicole asked this question. And honestly, I've never tried to do this on a large data set of any kind. So any recommendations on software to create Fixity on large amounts of data up to one terabyte? Have you tried that? Yeah, the command line tools would work better for that. I mean, just like any piece of code or software, it'll take time. But yes, I've run Fixity checks on, you know, a terabyte worth of material. It'll take a while, but not, you know, hours. You know, I don't know what rate it creates a checksum at, but it's a lot. I mean, it's probably, I don't know, 100 files a second maybe. Different checksums I think work at different speeds, but yeah. Yeah, that's true. Someone else, Bryce asks, will different checksum generating software always calculate identical checksums? Yes, so I talked about the algorithms that you use and we mentioned MD5 has come up a lot because that's a common one. So each algorithm will always produce the same checksum. So there are different choices of algorithms you can use. There's MD5 and SHA256 are probably the most popular. MD5 is actually when you enter a password into whatever, Facebook or Twitter or something. The way that it encrypts your password is actually through MD5. So when we always hear these news stories about people's passwords getting stolen, essentially what people have done is steal a database of checksums. And then because MD5 has been around for a really long time, a lot of people have actually managed to figure out what some of the algorithms are. So when you hear about these password hacks, it's actually people just figuring out general heritage institutions are not things that people are really trying to gain a lot of access to a checksum for. So I wouldn't worry about MD5 as far as it's security for the kind of information we're talking about. Right, that might be a future problem, but not right now anyway. Okay, so I think some of the other questions which get into things like how can we minimize the likelihood of bit rot, et cetera, you'll address as we go into backups. So do you want to go into that? Sure, so part two and we were just going to start with a poll in this one just to sort of see what people's current practices are. And this is how many copies of digital content does your institution keep and how many different geographic locations does it keep it in. So we have a couple of options there. Well this is pretty exciting. Two copies, two places is definitely the minimum standard, but it's great to see so many places doing it because not many that I have encountered are actually doing two places. That's good, I almost have the respondents with two copies, two places. And so we'll sort of, that looks good, you're free to close that Kristen. So I'm definitely happy to see people having multiple copies which is obviously important, especially when we're doing fixity checks. And two places are really important because I'm not going to talk about it too much today but obviously the environmental hazard and the institutional hazard and obviously we had Hurricane Sandy here in New York and we have had disasters in the past. So two places is very important and that should be two places outside of your building, hopefully. So I didn't define that too specifically but call that a co-location. So they should not be located in the same spot and that's key to two places. And so in backup and storage, we'll talk a little more low level and keep it pretty simple with some best practices. And so I sort of have this triple deuces rule which is three twos. So two copies in two places which it was great to see everybody doing and then two media types is important too, right? We talked about earlier about the longevity and degradation life cycle, specific forms of media. So if you put two things on the same pieces of media that are hypothetically degrading at the same time or becoming obsolete at the same time, then you sort of, you know, there's a certain hazard to that. So, you know, most people will have something on like spending disk which is just a hard disk either external, internal or on a server as well as say optical disk. So three deuces is sort of the minimal standard for backing up digital information. Two copies, two places and two types of media. And then three copies obviously is a slightly better standard for the obvious reasons. And then also two places and two media types. And so the other important thing, and we talked about this earlier when we were talking about physical and logical objects is knowing what you have and, you know, having worked with a lot of cultural heritage institutions especially smaller ones and nonprofits as well. They might be preserving things on multiple copies in multiple places but they don't, you know, if you pull out a CDR it is often very difficult for someone to tell you what is on there both looking at it and when you put it in a computer. So inventorying what is where both on the disk, on the piece of media as well as where it is in the building or outside the building, those are all sort of essential concepts. And then, you know, fixity we talked about is something that people should be striving to generate when they're preserving digital information. And she, you know, details of some of these are dependent on your institutional resources and institutional requirements. But certainly the triple deuces is where to start and then three copies, you know, if you can manage it. And so what are the types of storage that people use? And there are basically three types, online, near line, and offline. And so what do these mean? Online is essentially network attached storage so this means you have immediate access to it. A lot of us have the network drives in our offices and so that is online storage, both within the office environment. Something, you know, if you have web access, you obviously have immediate online resource to something you might be storing on an external server or in the cloud, you know, or even, you know, say in Google Docs if you're doing something like that. So online is basically means immediate access. And then offline obviously means you don't have immediate access to it. And that's often called a dark archive if people have heard that term, where you can store things that you don't expect to be available to you, not just immediately, but really within any reasonable response time. And then near line is sort of in between those things. So it's essentially not attached to a network so it does not have direct availability. But you can get it, a lot of universities have near line storage for preservation and this basically means that you can get it within like a day or two if you need, you know, a large set of images or publications or scholarly publications. So those are three types of storage. And questions to ask in your institutions when you're making decisions around this, obviously a lot of us operate mostly at the online or near line stage. That's how often do you need to access this material? Are preservation copies separate from your access copies? You know, a lot of people obviously have their images online. Those are not necessarily their preservation copies of their digital objects. How are your preservation access copies created and or managed? A lot of people keep their preservation copies in an offline storage environment and access copies will be in their OPAC or up on Flickr or something like that. And then how do your systems and workflows play in with these other systems, right? So, you know, offline is gonna be something you're gonna need to manage with specific pieces of software. And online is something you might wanna have tied into your content management system or to your OPAC. So those are things to think through when talking about storage. And so we wanted to put this slide in here, the techs back, just to sort of give people at least exposure to some of the verbiage when you're working with IT. I know a lot of people on here probably are their own IT department. And so some of this might not be super applicable, but whenever you're working with IT folks, be it internal or contractors, it's just good to at least have some exposure to some of these key terminology because it really dictates a lot of what you can do when you're doing digital preservation as well as when you're building online collections and OPACs and catalogs. And so a couple, we'll just go through the terminology pretty quickly. A lamp stack is something you often hear and that's what this little image is. And that's basically sort of an environment for having a server, right? And a server will have Linux, which is a computing code that tells the server what to do. Apache is basically sort of the type of operating system that's on there. My SQL is a database so that I'll put the information in to the server and structure it. And then usually it's PHP is gonna be the P's or Python. And these are code, basically sort of middleware code that will allow what you're doing on the internet or on your desktop to interact with that server. And so like things like Omeca, people are familiar with that collection management system and Drupal and some others. They can only be built in this lamp stack environment because they need these specific pieces in order to operate. So if you have say a Windows server then you might not be able to install certain pieces of software. So that's good to talk to IP about that. The RAID is essentially a device that has many different hard drives in it and it will store information across all those hard drives. So it's not necessarily multiple copies because there's usually just one copy. And instead of it being structured on one hard drive it's split up across many. And so network attack storage, we sort of talked about that with Nearline and Online. It just means that your storage system is tied to a network so that you can access it from your computer. SAN, a storage area network, is basically a network of networks. So you would have multiple devices connected to a network and then a SAN would be that network. And then CMS is content management system and DAMN is digital asset management system. And it's hard for us to give you recommendations on what to do or what to use or specific applications. But it's certainly knowing the technology helps you make the decisions that are required for the technology. So we'll probably get a good number of questions about that and we'll try to address them. And then when you're thinking about digital preservation systems for your storage and backup there are a couple of different types. You can work with a vendor. You can get a turnkey solution is often the term phrased for a piece of software that is installed and does everything. It will both have sort of a front end interface that you can use as well as manage your storage system and your LAMP stack. And obviously a vendor would most likely set that up unless you have the IP department to help you do that. And then open source software I just want to sort of mention because it gets a lot of press sort of in our community and it is very valuable and very community driven but it's also worth remembering that it does require its own sort of maintenance and involvement as far as use. So it is technically free to download and install but as far as support and maintenance it does require things as far as staff time or expertise and that's important to remember. So if you're sort of thinking about backing up and storage systems you know you need to think about your resources and expertise that are in-house, what you can pay for them if you plan on working with a vendor, requirements and needs, what type of material are you working with, what are your requirements as far as a new line or offline storage, how it ties into what systems you're currently using, even something as simple as Microsoft Office. You know that plays nice with certain pieces of technology and not so nice with others. And then also for those of us in cultural heritage we obviously are putting a lot of important assets and metadata and spending a lot of time creating metadata and we're putting that into these systems and so it's always important to keep in mind that sometime in the future because technology always changes you'll be needing to get your data out of that system. So data in and data out is something to remember whenever talking to vendors or IT about backup systems and storage. And you know no solution is permanent so our whole series of now point. And so these are a couple and I just put up the logos for people to research and they're generally in the resources are pretty easily findable online. Archive Matica is a sort of digital preservation front end system, it does not manage storage but it does help you, you know, I think it has a fixity generation tool in it and a number of other tools to address some of the challenges that we talked about in the earlier slides. Archive Space is what Archivist Toolkit is gonna be turning into. So it's a new Archives Collection Management System. Islandora is a preservation system. DuraCloud is an online storage system you know geared towards cultural heritage and non-profits and then Metaarchive and LOX are two initiatives. LOX stands for lots of copies, keeps up stuff safe and this basically means that many these are collaborative initiatives and I'm sure Liz and Tom will talk about them in their presentation next week for people to share infrastructure resources as far as preserving digital objects. So you would say have some sort of computer hardware on your site and then you would be hosting information from other institutions in your consortium and they would be hosting yours. So there's lots of copies would be spread around the network that the consortium has and there's a fixity checking element involved and then Fedora is just sort of a digital open source digital repository. So just some names to think about and the key concepts just to sum up this whole part of the presentation. Multiple copies, multiple places, right? Seems like people were doing a pretty good job about that. Multiple media types as well. Unique universal identifiers, we talked about that earlier. Just when a digital object has an identifier it needs to stay consistent throughout time regardless of where it's located and that really it just makes everything easier and you would be surprised how many places will change the name of a file or a digital object. When they put it on a different piece of media or when it somehow goes to a vendor or something like that. So UUIDs is sort of the acronym for that. Inventory and Identity, we talked about that a good bit at the beginning knowing what it is both in your digital environment but also in your physical building as well as if you're keeping things in multiple places know where they are. Record and monitor fixity information. So hopefully some of the homework will help people understand that better and there's stuff in the resources that people can refer to. So sort of try to wrap their head around putting that piece into their workflows. Work with IT, I probably don't need to tell anybody that. Be adaptive, we talked about the series of mouths and the software will change, formats will change, media will change. So digital preservation is always trying to remain flexible and that ties in with the data in, data out principle and that links into our last bullet which is systems change but data shouldn't, right? So that is the key part of digital preservation. So that's back up in storage and it looks like we have some questions. So do you wanna talk about them a bit, Daniel and I'll try to answer? Sure. Boy, do we have questions. Yeah. Just a quick first note, we posted the link to the handouts. Unfortunately, when Adobe Connect took his slides it did make the background very, very prominent. So if you're having a hard time reading be sure to look at the actual handouts that you can download as a PDF. They seem to be pretty readable so. Yeah, I don't know why it's that gray but. Yeah, Adobe Connect does funny things sometimes. So we've had a lot of questions and some of them I don't know that we'll be able to get to all of them but I'm gonna try to digest. The first one was can you explain RAID a little bit more and there are different RAID options and Nicole asked can files become corrupt on a RAID system or how does that work? Yeah, I mean a RAID system, I guess it's mostly just an easy way to think about it is like a network of hard drives perhaps. So, but they're all within one system. So it's a random array of independent disk is what the acronym stands for and that just means that there is a piece of software that is writing information to all those disks but it's not writing multiple copies of the information. So what that says you is that a disk failure in a RAID array will not affect all of the information stored in there. It'll only affect the one disk that fails and so hardware, the external drives and internal drives often fail very easily because the way they operate is very dependent on being hermetically sealed and they're just a tiny spinning very quickly and a hard drive crash is a term you often hear. So a RAID array sort of addresses that by isolating the catastrophic damage that might occur with a head crash if that sort of helps. So it's just multiple hard drives so it's not writing information multiple times, it's writing that across them instead of just one drive. Okay, and then the other question we got a lot of variants of concerned cloud storage and people wondering what exactly is meant by cloud storage and I gave them my flip response which is that it's a hard drive you don't control. But some of the cloud storage options and people wondering about safety and perhaps some of the tools that you might be able to act. Yeah, cloud storage is certainly an emerging thing so we're familiar with it as far as all of our Gmail and social media that we use but as far as a preservation option I guess the first thing to say is that it should never be the only place you're storing information. Partially for the reasons Danielle mentioned, I mean you are dependent on a corporation, you're dependent on hard drives and service systems that you have no control over and you don't know where they are and a lot of people probably hear about Amazon Web Services, AWS going out a lot on the East Coast at least and then you can't access Gmail or whatever cloud-based device you might be using. So then there are options for it as a backup and DuraCloud is a great one and Amazon came out with Amazon Glacier relatively recently and this is basically offline access. So they give you very cheap storage but they very highly limit your access to it as far as the amount that you can pull out of it once you put it in there and so it's intended for sort of long-term storage. So the cloud is an emerging and cheap way to store things online but it should never be the entirety of your preservation solution so you should always have multiple copies but I think as a place to store archival, dark archives, offline storage it certainly has a lot of potential. Just a couple more things and then we'll move on though I think we're doing quite well in terms of time. There's some things we'll come back to at the end but your triple deuces analogy Sarah Andrews wanted to know if that carries over to higher memory need items such as video and audio collections, how do you handle things that are really large for maybe two types of media especially? Yeah that's a great question and I've tried to talk about sort of institutional dependencies and trying to align best practices with the resources available within your institution and AV is the one that always comes up. I mean I had a big meeting about this yesterday with a museum here in New York that is videotaping every public event that happens and they just have this massive, it's not exactly an archive it's basically just storage on site and they're not doing this multiple copies in multiple places because it is cost prohibitive and they want to and they're looking for solutions to address this I mean it is a principle so yes it does apply to AV and AV can obviously be gigabytes in size for single files. So it is applicable but I am certainly cognizant of its difficulty to be applied to collections that grow quite large in size and so I think in some of those cases that's where cloud storage can come into play or things like locks or other collaborative consortial approaches to infrastructure but yeah two copies is applicable to AV and other large files. Yeah and things that don't fit even on a DVD are challenging when you want to try to get different media types in there. It looks like they've switched out the PowerPoints to one that doesn't have a background so for those of you who are having problems I hope this will be better. One last question and this is more of a philosophical question about digital objects I think. What is the original of a digital item? So when you make a backup or you and if you don't capture the fixity perhaps do you lose authenticity with a digital object? You know what are your thoughts on this? It is a philosophical question but how does it serve the original when we're moving it around? Right well I, this is a great question and it's one that comes up a lot and it's one that makes great articles and talk over beers but really I mean the great thing I think about digital objects they seem very fragile and difficult to manage at times but one of the beauties of digital information is that it's so easily copied. So I think authenticity in a digital environment becomes much more dependent on metadata and management than it does on information content and you know some people would disagree with some of this but if you can make a bit by bit copy of a digital object that has the same checksum, it's accessible by the same software, it has the same UUID, everything about it is the same then it is just as authentic as the thing that may have been given to you. But what has changed is that you've made that copy right and so I think this is sort of premise metadata tries, the premise metadata schema tries to capture a lot of this, it's a digital preservation schema I think Daniela you probably referenced it a bit yesterday and it's built to sort of try to capture how objects change, you know what changes around them even if they themselves don't change. So you know we will never be able to keep you know add infinitum into the future a floppy drive just like a piece of paper, things degrade over time and eventually will rot and fall away and that's true for physical media and digital information is dependent on physical media but the actual object as long as it's the same and is duplicated, it's just as authentic in my book. So authenticity becomes very different because it becomes more about how it's managed and the information about how it was acquired and managed in its context through time more so than I think in sort of traditional preservation in archives. Excellent and I would just add that something might be authentic but if you aren't capturing that 60 information and checking on it you just don't know. Right, yeah so a lot of the digital forensics things that what we put in the resources as well as just sort of the online community you know that comes out of the legal world and the criminal world so a lot of these tools come from authenticity in a legal context and that's not one we operate much with in you know nonprofits and cultural heritage but much of the digital forensics tools as well as just sort of the idea of how we interact forensically with information comes out of this authenticity context that's driven by you know legal rights and things like that so it's good to remember. Yeah, okay I think we'll move on. You could address really quickly actually someone popped up with what is near line storage if you wanted to give a quick definition of that one again. Sure so it's you know we talked about online is very immediate access that you can go right to it put something in pull something out and offline is basically a dark archive which is not meant to be accessed much at all if ever. So near line is a little fuzzy but it's between those two so a near line storage would be maybe your IT department has a server you know somewhere in a different building if you're a big institution and you can request of them content that is held in their systems in their storage systems and they will deliver it to you say in two or three days. So this would be things that are not obviously different copies might be online for display or online for access but perhaps the preservation copy is not directly accessible by people because for something to be accessible it is sort of endangered because you can have lots of users going in and people have access to a lot of a lot of people have access to online storage. So near line is sort of managed because it's not directly accessible by as many users. So it's usually you can request something and you get it back in a couple of days. Very good. Let's go ahead and move on to levels of preservation. We have had a number of people who are from small institutions saying how do I deal with some of this and maybe this part will make it clearer to you. Yeah. So as I mentioned in the beginning of the presentation this is sort of a project by the NDSA the National Digital Historic Alliance which came out of the Library of Congress and sort of some of the background to this is basically to answer just what Daniel was talking about which was trying to make some of these principles of digital preservation both scalable and more accessible. So there's very little jargon in here. It's not one of those big framework diagrams that I showed earlier that is just difficult to look at and certainly even more difficult to understand. But what it does cover is key concepts which we've been talking about throughout today and process areas. And the NDSA is over 150 institutions or members. I certainly encourage people to join if they're interested. It's a collaborative community group. Everyone from large research universities to tiny historical societies. So it's a very diverse membership and so this was driven by people at all different skill levels at all different types of institutions. Coming together to define what are accepted best practices through community participation in the creation of this and it tries to build on some baseline considerations and can be used as a sort of self-assessment tool and we'll walk through some of it in the next slide. And just as we said all along what it doesn't cover is institutional context. You know it is meant to be accessible to everyone but it can't answer to everyone's specific unique needs and levels and expertise. It does not make specific technology recommendations as far as pieces of software or types of systems. It doesn't tell you how to do something specifically and obviously policy is something that everyone will have to develop on their own. So another goal was to keep it on one sheet of paper, right? The most accessible thing to anyone is having it on one thing that you can hold in your hands and so I'll try to walk through some of the boxes and talk about how they applied to what we've talked about and certainly answer any questions about them and so you see the columns here. You have level one, level two, level three, level four and so the document is scalable and then you also have these sort of areas of practice storage which we talked about a lot today, geographic location, file fixity, information security we didn't touch on too much, metadata and file formats and so you can use this to see both where you are as far as doing digital preservation but also use it to plan for where you want to get to and of course the levels and the areas of practice are interdependent so you could be at different levels for different activities and within each box there's gonna be a couple of things that are considered community accepted best practices for doing digital preservation in this area and at this level and so we'll just talk mostly about level one and two and so two complete copies that are not co-located that just means two copies in two places which we had a great response for people doing that and that's very easy to implement and for data on heterogeneous media that means basically what we're talking about and we were talking about physical media earlier get the content off that medium and into your storage system so there is a expectation that people will be migrating off of things that can become obsolescent or can degrade so floppies, we had a lot of people with floppies which was exciting to see because that means you're preserving them but also can be dangerous because they are well past their longevity at this point so those are those two, file fixity and data integrity we talked about check file fixity on ingest if it has been provided and if it hasn't then created so we talked a good bit earlier about how to at least think about that we're gonna have you do it in your homework and then we also provided some tools in case people wanna try to implement it in their own institutions information security wouldn't touch on too much but I guess I talked about a little in talking about online and near line storage so identifying who has the authority to read, write and move and delete individual files and digital objects so near line storage is sort of to try to address that you give fewer people the ability to access things and by managing access that helps preserve content and people are not necessarily there to try to alter things intentionally but we all obviously have lost files probably and overwritten things that we had done accidentally deleted things so there is that fragility that it can be mismanaged accidentally and information security tries to address some of that and so knowing who has those authorizations and listing it is level one information security metadata is just having an inventory which we talked about and where it's located I talked about that a lot throughout the slides and then ensuring it's back up and not located of the inventory as well as the data so we talked about that too it's not just multiple copies multiple places of the actual digital objects it's also multiple copies multiple places of the documentation or metadata that tells you what they are and where they are and then file formats I mentioned that it's important for people to just keep in mind trying to use and encourage donors or people that you work with to use open formats so that they are sustainable and accessible through time and that could probably be its own preservation so that's level one level two the idea of the levels is they get a little more rigorous but they sort of touch on the same features so we move up to three copies one is in a fully different geographic location creating documentation about your storage system you're actually checking fixity at level two using write blockers which we talked about virus checking which we also talked about information security is documenting those restrictions that were in level one and then metadata is store it gets more robust as you go up the level so you would be creating different types of metadata and then having an inventory of file formats and use so I'll sort of stop there just because I think those two levels are more applicable to everyone on the call and we got what 15, 20 minutes left so I think it'd be good to just take some questions about the levels of preservation I mean we have a poll in the next slide so maybe having explained it and gone through it but we'll do a quick poll just to see where people think they're at and then we can talk about some of we'll go back to the previous slide after that and talk about some of the boxes and suggestions that are included there that sounds like a great idea so this is kind of what we had predicted I think yeah I mean it looks like a pretty it's a good mix of and I think it'll be interesting to get some questions from people not yet at level one certainly as to what they see as the biggest hurdle to to getting there you know they're as I was talking to Danielle before the start of there's always been the talk about having a level zero it's kind of an ongoing discussion but it's very difficult to define what that would look like and telling people they're at level zero is not a good thing to do I like the phrasing not yet at level one yeah well you know a lot of the discussion was what is level one and you know those are basically the best practices what are the like bare minimum best practices for preserving digital information so there was obviously a lot of debate and put on level one so we can close the poll and I'll get back to yeah and I think at this point we were a final question so if people have comments about or questions specifically about how they might help their institutions get to level one feel free to share that but we do definitely have a few that we hadn't gotten to earlier and so I wanted to get back to some of them one of the things that tied over a couple sections of your presentation was you know we talked about preserving physical equipment but Nicole asked the question what about preserving or the longevity and or access to the check sums used to create fixity the algorithms for those yeah discussion about that that's a good question I mean with the you know the sort of problems that we mentioned about md5 suggests people will be using it less in the future but I think it also comes back to our sort of series of now's idea you know as long as that algorithm can do what it needs what it needs to do at that time which is authenticate a check sum that it previously generated you can then simultaneously generate one using a new algorithm right so you know at the same time you authenticate that this digital object is the same you can then use a more updated algorithm so actually a lot of the fixity tools we have will generate multiple ones at one time so you're never necessarily reliant on one algorithm and the tools all the tools are doing is implementing the algorithm itself so you'll never be dependent on a piece of software to do it yeah and and I think you remember this kind of talks back to the presentation on digitization where you're focusing on open formats is a very good thing open and popular so if you choose proprietary algorithm that only one provider uh... will let you calculate the number then you might have more problems in the future md5 and sh8256 probably all you will need yeah and they're both open we could recreate new software to calculate them in the future yep um... this goes back to a media question um... if you have a master or preservation copy of your image document whatever does the media wear out over time do you have to worry that eventually your cd will die and you can't copy things off of it yes that's a big problem with optical disk is that one scratch means the entire disk is inaccessible so that's very different than the analog world where you know a scratch on your record will still play which is why optical disk is a little problematic they were also pretty cheaply produced this is one reason they are so cheap to purchase so this really just gets back to the multiple copies you know idea if you have multiple copies and one is destroyed then you'll be okay but physical media always degrades and and a lot of them have very short life spans and one's like optical i mean you do have this problem where it's entirely unreadable with what could actually be very minor damage so it's not so much that they wear out the way you would think of a cape wearing out because it's been played a lot but it just is more likely to be damaged over time uh... i mean yes it does wear out to anything physical wears out this is you know sort of a sad fact of life but yeah and actually i mean digital objects also the more they they are accessed you know the more they are likely to get bit rot so bit corruption generally will happen more frequently the more frequently something is access this is why the whole reason people create access and preservation copies uh... as if you were you know accessing one specific thing the whole time it's much more likely to degrade so yeah physical media definitely degrade certainly with copies all a lot of glue and a lot of you know thrown together things back from the early computer era uh... that actually degrades over time it's more difficult to read and access uh... one was asking about options for multiple copies at the point of digitization and i think he had a clarification of that somewhere but i i can't find it right now uh... but just in terms of work flow uh... are there recommendations for how to make multiple copies multiple media etc at the time you're digitizing materials uh... yeah you know it depends on how you're doing digitization and i think you know jake probably talked about this a good bit in his earlier presentation but usually you're creating preservation copies the point of digitization and then generating an access copy either right after that or maybe at some point down the line uh... so usually scanning a pretty high quality you know three hundred six hundred dpi or something and that's usually a preservation copy so you can create another preservation copy right there you can create an access copy and then never use the preservation copy again i think it makes sense to make it part of the digitization workflow as far as making another copy and putting it somewhere and on past digitization projects i've worked on this is what we did you know you will you know scan at a high resolution all day have a big file make a copy of the preservation master and then work with that at the original one will go on to the server on to a spinning disk into your storage system and and the clarification which i found and it looks like christian also found well he was talking about a two hundred page document that you're scanning and it fails at page one thirteen so presumably the last however many pages are not imaged or the uh... there's something that causes the whole file to fail this just and it's the same for especially for audio and video this is why someone i usually a human being needs to actually look at the file or listen to the file after you digitize it to verify that that your digitization worked yeah so i mean text files are complicated because it's basically a whole bunch of images that need to be stitched together through metadata but they're all you know every scan is a discrete digital object the previous ones should not be you know would not be problematic and hopefully you could just you know redo whatever has failed so hopefully before you designate something as your preservation copy you've actually verified that you know there is the information in it that you want to be preserving this is i think that going to be our last question and it's another sort of philosophical question uh... sheryl mcclon says if you have an external drive with a couple thousand jpegs created over a seven-year period obviously your collection is at risk external hard drives go out bad after a while what next what is the next type of of storage media that that we think uh... will be the salvation or or at least aren't in the trip now our next thing that we put it on right um... i mean spending this is what we're going to use into the future so you know can be thought of a couple of different ways you know an external drive is technically a spending this but usually it's going to be you know a large server system with arrayed array so you know what next i would say a at least a copy on another hard drive uh... as well as a copy on another type of media benefits like speculative what's the next big media thing going to be spending this into the future just because of the cost and obviously hard drives gotten much larger both the commercial level but also for it departments and and stuff server acts uh... so the cost has gone down enough that i think spending this will remain the storage of choice but you know you might it might be capable to transition either uh... transition from external internal drive to server systems rate a raise our cloud storage so a couple thousand apex is probably not a whole lot uh... and it would probably be pretty easy in cheap store online backup so in cases like that i think you know the cloud doesn't offer some advantages as far as cost and management and you are dependent on those things that you know the corporate infrastructure and things like that but they're always a backup you know so you will have a another copy even if that fails okay um... so with that thank you again for all of your help and uh... your wonderful present presentation today i wonder if you could just say a quick word about the homework um... which involves calculating some checks right uh... sure uh... so we talked about uh... essentially we're gonna have people sort of grab a very simple text file and use an online tool to generate a checksum uh... so this is basically just uploading a document to the site and it will give you an md five checksum and then what you're going to do is basically make a very small alteration to the file and then re-upload it and compare these two checksums so you know it's it's it's important to think about fixity what's great about fixity and about these algorithms and checksums is that the tiniest tiniest change like one one or zero throughout millions of them that might comprise your file if one tiny thing changes then a checksum will be dramatically different so they won't even look anything alike so we're just gonna sort of uh... have a simple exercise to generate a checksum and then change a file and then generate another checksum and match them up and that's basically uh... we're doing a fixity audit essentially on the file we just happened to have changed it intentionally right and there's some additional exercises in a handout so if you want to try to set a more advanced level there's instructions on how to do that as well yeah so and we listed some software tools and you can essentially do the same practice but on a whole directory so you could do ten or twenty files uh... do the same exercise and then we have a couple of other exercises one is the sort of work with the levels of digital preservation document and there's a website for this if people want to comment or want to try to give more feedback and obviously my email here will be available and i'd love for people to get in touch with me and ask questions and talk more about that document as well as the rest of the presentation uh... but so to use the level sort of uh... to assess where you're at and where you think you could be and how you might try to accomplish some of the uh... best practices that it lists there uh... and then we have a couple of other exercises just sort of uh... basically answering questions to determine uh... your institutional resources and uh... expertise and how you can build those and what questions you need to be asking when thinking about building a storage system or even making backups well great i think that is all the time we have christin anything else thank you daniel and jefferson uh... thanks everyone for helping each other in the chat and as daniel said we will get um... answers to some specific questions you asked uh... maybe out to you privately again join us on monday at two o'clock eastern time for our last class in this course thanks again everyone thanks everyone