 Hello everyone, this is Kristen Lays from Heritage Preservation, and welcome to our second class in our online course Caring for Digital Materials, Preventing a Digital Dark Age. And this is part of our series Caring for Yesterday's Treasures today, which has been made possible through generous support from the Institute of Museum and Library Services. And we want to thank Learning Times for their help in producing these websites. And this webinar for us today. If you are hearing an echo, you may have been hearing it twice, so you might want to check that. And hopefully your sound is otherwise good, and if not, feel free to just drop us a note in the chat. I wanted to apologize again if you had any trouble on Tuesday logging in. I had such tremendous interest in this course that I think we even just crashed the server, and already we've more than 300 people joining us today. But we're very glad you've come back, and we hope to give you a very informative email. We will be doing some more classes next week. So today we have Jacob Nadal talking about file conversion. And next Tuesday we'll be talking about metadata. Next Wednesday, safe archiving through backup copies. And then the following week on Monday we will finish our course with talking about networks for preservation. Just a quick few notes about what you can find on our course webpage. I've put the URL up here, and you can get to it through justconnectingtocollections.org. We have a link to the PowerPoint slides you'll see today, so you don't have to rush and take notes. You can print them out and have them handy. A link to our homework assignment for this and a link to any resources that have been referenced in this class. And if more are talked about during our chat today, we'll make sure to get that up as well. And we have some great suggestions for links on Tuesday's class, and that is going to be put up hopefully today or first thing tomorrow. Just a note, the recordings will not be posted on the page. As a rule of thumb, you'll get these by email. Once the entire course concludes, then we'll put up all those recordings. So thank you for looking at that site for all the information that you might need. So you have the option in coming to these courses to work toward the certificate of completion and a digital credential. So just make sure you've registered through our site course, either live or recording, that you complete all homework assignments. And while we don't provide individual feedback on homework assignments, we'll give you a little bit general feedback to the group. And make sure that you've done all of this by Monday, April 22nd, so a week after the very last webinar aired. Okay. I also wanted to make sure that you know about the online community. You probably already have joined. We're actually very close to breaking the 3,000 participant limit on that. So we have almost 3,000 people that have joined, and we're really happy it's growing so much and we hope it continues to grow. So if you've had questions that didn't get addressed today or other questions about collections care, this is a great place to where you can ask your peers. And a number of conservators have been active participants on this list. And if you have any other questions about the course logistics or anything we talk about today, here's our contact information. So please feel free to send us an email or give us a call by phone, and we'll make sure to respond to you. Before we get going too far into this, I wanted to find a little bit about you guys. So I want to ask a poll question just to see what type of institution you're coming from. So if you could just click your response here, that would be great. It would help our speakers kind of get a handle on who the audience is today. Great. It's like a number of libraries, a few museums. And we know that all of you work at an institution. So if there's any particular questions you have, or you're having a hard time wondering how the homework can be applicable to you, please do let us know. Basically, homework assignments discuss institutional collections, but you can always think about your own personal digital collections when you look at those homework questions. We've had about 300 people participate in the homework from Tuesday's class, so that's really great. Okay, great. So it looks like a nice mix, a nice balance of institutions represented here today. I'm going to go ahead and close this poll, and I want to ask one more question if I can. And that's just to kind of get a sense of the size of your institution. And we typically do this with a budget, with the budget of your institution is. Slightly imperfect way to measure it, but sort of one of our best ways of doing it. I know that many of you are on smaller budgets and our speakers have done their best to sort of scale some of the advice that they'll give to make sure it's applicable to people of all types and sizes of institutions. So give it just a few more minutes for you to log in your reply. Okay, that's great. All right, I'm going to go ahead and close this poll. And then I wanted to introduce you to Danielle Plummer. We have seen her responding to questions in Tuesday's webinar. And she is really helping us by coordinating all the content for this course, helping answer your questions so we can get you great feedback in the chat. And she's asked all our speakers to participate. And she's just going to give you a quick overview. And while she does that, I'm going to actually drag away the hello box, not to cut anybody off, but we're going to start the moderated chat now. So Danielle, turn it over to you. Thank you, Kristen. And we really thank Heritage Preservation for making this opportunity possible for us to work with you guys a little bit about digital preservation. Because of some of the problems we had on Tuesday, we wanted to back up a little bit and just give you a little bit of background about why this class was put together and the sorts of things we're hoping you get out of it. So the first question, and a few people asked this on Tuesday, what is digital preservation? So this is a definition, and our main presenter today, Jacob and I actually worked on helping to come up with these definitions. But the short definition is digital preservation combines policies, strategies, and actions that ensure access to digital content over time. Many of you work in institutions that have collections of physical materials, paper, artifacts, perhaps audio-video materials, and you preserve them. Digital has to be treated the same way. We have to think about how to preserve our digital content going forward. And so that's why this class was put together. There are reasons to do it. First is cost. If you've gone out and done a digitization project, you know that it is not free. If you pay a vendor, you have a really good sense of how much it costs you. But if you do it in-house with your own staff, you may not even think about, well, that took, you know, 400 hours of staff time to do the project. But the point is, if you preserve the digital content you created, you don't have to do that work over again. So cost is a big reason why going forward we want to keep the things we've done. There are some things you may want to do over because you don't like the original results. It's a slightly different case. But we don't want you to just waste your money to have something done and then lose it. And then the second reason is institutional mission. This is especially true for born digital materials. Things that you don't have any kind of physical counterpart for. So a database is a really good example of something that just, you can't print. That just doesn't work. Email to some extent. And then also things that you know you're not going to be able to preserve the original physical form of something indefinitely. And this is right now true of audio and video tapes where the magnetic media that we're used for these is degrading. And we have to figure out how to preserve the intellectual content even if we can't preserve the actual tape. So this is part of your institutional mission to preserve these materials. And you have to think about that. There are two examples I just wanted to share briefly with you. And I've got links and you can find them yourselves pretty easily. But the first is the 1960 U.S. Census. In 1960 the census was done by computer. So the materials were sent to people, they were sent back, scanned and computer tapes were used to process all of the census returns. The National Archives mission is to preserve the records of the United States government. And so they had to figure out how do we preserve this computer, this digital version of the U.S. Census. There were some problems and they've written an article about it. The good news is that most of the data is still there. They were successful with their digital preservation project. They are now estimating that perhaps 5% of the digital data was lost or was corrupted and won't be recoverable. We won't find out exactly what that data loss was until 2032 when that census will be available to the public. Another example, in 1986 the BBC in England did a project called the Domestay Project where they went out and collected stories from all over England. And this was to commemorate the Domestay book which was done after William the Conqueror invaded England in 1066. They put all of this together and they put it onto two laser discs. Most of you have never seen a laser disc. I saw them but I never owned a laser disc player. Well, long story short, only about 10 years, maybe 13 years after this project was done, they realized laser disc was a format that had not caught on and they needed to figure out a way to preserve the information in a device independent state. So they actually spent almost 10 years getting all of the data back off of the laser discs where they'd stored it and figuring out how to preserve that information and make it accessible. So again, it turns out to be another digital preservation success story but it opened a lot of eyes about the importance of it. I'm sure that you all can think of things in your institutions or in your personal life, files that you've had that you no longer have access to, materials that have been damaged or destroyed, either because of disasters or just because nobody thought about trying to preserve it. So again, that's why we're doing this class. I want to make a note about jargon because this is another comment that came up on Tuesday. Some of the terms and concepts used in this series may not be familiar to you because this is a new field. Really, we've only been doing digital preservation. Some people have been doing it for perhaps 20 years, but for most of us it's been within the past five years or so. So we're still trying to figure out what to call it. Some people are calling it digital curation, digital preservation, and various other terms for the same idea. And people from different communities of practice and libraries, archives, museums, historical societies use different terms for the same thing. That makes it really hard for us sometimes to share information effectively. I put a link up to the National Digital Stewardship Alliance, NDSA, which is an organization run by the Library of Congress. They have a glossary of digital preservation terms up at digitalpreservation.gov.ndsa-glossary.html. It's not complete, but you can find a lot of terms there. If you don't recognize a term or concept that we use, please ask us in the chat to explain it, and we'll be happy to do that. So series goals. At the end of the series, we want you to have a better understanding of the inherent fragility of digital objects. We want you to be able to acquire information to help you select preservation formats, metadata, and backup systems for your digital objects. And we want you to be able to identify one or more actions that you can take to improve your institution's digital preservation efforts. As Kristen mentioned, this is the second session, Convert It to Preserve It, Digitization and File Conversion. It will cover both creating digital materials and also some conversion tips for making sure that they can be preserved. On Tuesday, we did the first session, which was an overview of digital preservation. And Kristen mentioned that we've gotten a lot of homework results in from that one. And I just wanted to pass on a note from the instructor who said she was very pleased. She thought the responses were very realistic. Some people put roles, like archivists, for their dream team of who they'd have helping to preserve materials. And some people put the names of actual individuals in their organizations, which was great. So the idea is just to get you thinking about how to realistically approach your projects. And that was a wonderful success story for the homework. If you have specific questions about the homework, feel free to contact us about it. After today, we have two more sessions next week and one the following week, as Kristen said. So please, I hope you will come back and join us for those. And with that, I want to turn it over to today's instructor. Jacob Nadal. Hi, everyone. Oh, sorry. Let me just introduce you. He's the Director of the Library and Archives at the Brooklyn Historical Society. And that's a new job for him. He just started that in April. From 2008 to 2012, he served as the Preservation Officer for UCLA Library. And from 2005 to 2008, he worked as the Field Services Librarian and Acting Head of Collection Care in the Preservation Division of the New York Public Library. He has a Master's in Library Information Science. And he's been widely recognized for his role on various task forces and groups within the American Library Association's Preservation and Reformatting section and in other areas. So we're very glad to have him today to talk to you about digitization and file conversion. Hi. Good day, everyone. Obviously, eager to talk about this as well. Thank you, Danielle, for the introduction and to Kristen and at Heritage Preservation and for learning times. Thanks for providing all the technical support required to get this program going. Today we're really going to talk about how to make something digital if it's not digital in the first place. And then on top of that, to talk a little bit about the characteristics of digital information. It makes a file, the sort of basic unit of digital information, long lasting, long lived and eventually preservable. Today we'll talk about a couple major types of digital materials. Text and images will be our primary focus. And then this is the core of what exists in most library and archives collection. We'll also spend a lot of time talking about audio. More and more people are doing audio, not just for music but for oral history projects for webinars like this one. Audio, I think, is a format that's growing and important in our community. These three formats, text, images, and audio, are also the formats for which we have pretty good preservation answers at this point. Towards the end of the webinar, I'll talk briefly about video data and interactive systems. We de-emphasize those a little bit partly because of the types of institutions who are involved in this. It's less common for them to have these types of collections. Also because this is an area where preservation consensus is still emerging, there are lots of very technical questions being debated and very few guarantees about how to make the right choice here. For each of the formats we talked about today, I'll tell you a little bit about how the format is designed. A digital format has a particular decided upon structure and engineering behind it. We'll talk about the risks and advantages of those formats from a preservation point of view. And we'll talk about the key specifications for creating oral working with vendors to create files in those formats. I want to just touch again on sort of the definition of digital preservation and show you the medium form version of that definition. It's actually a long form as well up on that ALA website. The medium form definition expands on what Danielle presented to you by mentioning both reformatted and born digital content. And today we're going to focus largely on reformatted but also address issues of born digital content. So we're really in this point in the webinar at the beginning of this process talking about how you get digital content. The other speakers will follow up on pieces of this media failure, changes in technology, and talk about rendering and authenticating content over time. So with that said, let's start in on text. Is this one of our poll questions, Kristen? Yep, it's one of our poll questions. Let me bring this over. Right. And I have to make it larger. Hold on a sec. And just to note, if you don't currently do this, each of these poll questions will give you a place to say, not currently, but plan to. And if you don't know, be honest about that too because that tells us a lot. Yeah. I suspect actually if we did this poll at the beginning and end we might get different results. I'm hoping so. As you're talking about what digital text means, yeah. OK, good. So it looks like a lot of attention to either having been involved in text digitization or having plans for it. All right. So text in the digital world really just means text. It's just letters. And the encoding of text we like for digital preservation is UTF-8, which is a subset of Unicode. Now there's something called ASCII text, which we'll talk about briefly, which is pretty limited. It's 128 characters, a lowercase a and a capital A are two different characters, punctuation marks, numbers, and some control characters, things like carriage return or delete that are used for inputting text data. Useful if you are speaking English, decreasingly less useful if you're speaking, say, French or Spanish, of almost no use to you if you're speaking Chinese or Japanese or Hindi. Unicode is a much broader character set. We'll look at it in a moment. And UTF-8, the Unicode Transmission Format 8 that is the bread and butter of digital text preservation. And I'll say again, it's important to remember this has no font face associated with it, very little in the way of layout, nothing much more than a carriage return. But text is what's critical for searching and manipulating data. If you type a search into the Google search box, it's going out and it's looking through the raw text of web pages. It doesn't particularly care if the text is presented in Times New Roman or Ariel or Helvetica. It's looking just at the character data. UTF-8 is also important because that's the default encoding of XML and you'll hear in my session in the sessions that come people talk about XML over and over and over again. That's one of the common ways to exchange and manage data. So this is the U.S. ASCII code chart for the format I was talking about just a moment ago. And you'll see an ASCII character has seven bits to it. These are the zeros and ones that are written to the hard disk. So if you wanted to write the capital letter A and you look over in column 4, row 2 there, you would know the bits that would be on the disk. A 1, a 0, a 0, a 0, a 1, a 0, and a 0. And that's an ASCII character, the capital letter A. As I said earlier, this is useful for certain languages, not for others. It's a pretty limited character set. So the computing community has really moved on to Unicode. Unicode is an idea or a specification or standard. It's a sort of way of representing text. UTF-8 is the format. UTF-8 is the actual mechanism for recording Unicode to a disk. Important to know this, you'll hear people use the two terms essentially interchangeably in conversation when it comes down to the rubber on the road or the bits on the disk in this case. UTF-8 is what you'll actually encounter. UTF-8 characters are longer. Instead of eight little bits, they can be from 1 to 4 octets, 1 to 4 8-bit bytes. The first 128 characters in Unicode are that US ASCII characters that we looked at. So it's backwards compatible with the US ASCII characters. After that, there are lots of other things. We'll look at a UTF-8 chart in a second. The virtues of UTF-8, aside from doing a better job of representing language, easy to identify. There's an unknown string of text on a disk. There are simple search patterns that will correctly find UTF-8 more than 99.5% of the time. This is very important if you're ever considering the situation of having had a digital preservation disaster where you have media that you don't know the source or contents of something like UTF-8 is easier to find. It's the default native encoding for XML that of course has multi-language support. So this is some of the UTF-8 character set that goes on and on and on and on. You can see here some Armenian capital letters, some Bengali letters and some Thai characters. UTF-8 code charts include many languages. They include many symbolic sets such as mathematics. So you can imagine there are some disciplines that rely on this very heavily. If you're looking for conversion and you're working with someone to say do OCR or to do text transcription for you, you, at this point in the game, in the year 2013, should be requesting UTF-8 encoded text files of all those transcriptions and OCR. And if you're using a program to do that conversion, say you're running an OCR software program, get into the settings and make sure that you're creating UTF-8. That's going to give you the most long-lived text. And that is something you would do both for conversion and for creating new text. So if you are creating text documents, web pages or XML, you want to make sure you're creating them in UTF-8. So those things we just saw, those Unicode character sets in that ASCII chart were images that I put on my slide. They were, in both cases, JPEG images. Computers, it's important that if I don't read, they encode and decode. And we happen to know that something is a letter the computer sees, 10010000. For us to have what we think of as a digitized book, we need page images, that is a picture of the way the page looks, as well as a text transcription of that page and metadata that binds all of those things together in the proper order. So when we're doing reformatting, when we're digitizing materials, the way we generally do that is to either re-key it or to use optical character recognition, OCR. Some of you I know have encountered this and some of you I think probably will. It sounds like in the near future. It's worth having a little buyer-beware moment that OCR accuracy is usually reported as character-level accuracy from ideal sources. So that's a relatively modern, cleanly printed typeset document. It's important to note that the actual outcomes you get in terms of number of words accurately transcribed and number of accurate characters from less-than-perfect sources is lower than that. OCR has pretty high character-level accuracy at this point. Its word-level accuracy and its accuracy for subpar sources is a little bit lower, but it often gives you a pretty witty text. The problem you will encounter here is that a single character misspelled in a word or misidentified will ruin a text search, right? Remember, computer searching uses that raw text, and if that character data is bad, search queries I type in may not return accurate results. So depending on the intentions of your project, it's worth paying attention to whether OCR or re-keying is necessary. Text also has sort of two use cases for it. We've been talking a lot about text as a source of information, text as a document. It's also important to recognize that text is the building block of the digital library. So XML, HTML and Cascading Style Sheets, UTF-8 encodings, the PHP code that many websites are used as their scripting language. All of those things are written and recorded in text. So in addition to the documents in the digital library, the digital library itself is probably largely built up of text files that are interpreted in a certain way. So a web page, for instance, is a text document. And when you point a browser at that text document, it knows to render colors and positions and layouts. For documents in the digital library, often you will get these as text plus. So a Microsoft Word document has text in it, but has a lot of other things to control style and formatting and layout. I want to touch on this lightly. We could go way down the rabbit hole here. But just to say that Microsoft Word has become a sort of de facto standard, especially the most recent version is actually XML-based because Microsoft wants to do document sharing on the web. This doesn't make Microsoft Word a great preservation format from an engineering point of view. No one who was trying to create a long-lived document format would create Microsoft Word. It makes it a pretty good preservation format from a risk management point of view, however. If there's ever a preservation problem with Microsoft Word, you will not be the only person having it. Hundreds and thousands of millions of people will be right in there with you, and you can look to a community solution. PDF is another way that we get text in our collections, especially born digital collections. PDF is a format with an open license. It can contain text, images, audio, video, fill-in forms, a wide variety of things. So know that there's a subset of PDF, PDF-flash-A for archival, and it's based on, I have a misprint in my slide, PDF version 1.4, not 4.1. PDF is currently, I think, at version 1.7. This PDF 1.4, PDF-A, has a subset of PDF features that are considered preservable. So if you have PDFs and you can check their version or convert them to PDF-A successfully and see that no content has changed, you have what's going to be a very long-lived file format. Later versions of PDF, 0.5, 0.6, 0.7, and PDF with complex objects in it can present some problems down the road. So again, just to review the key specifications here, UTF-8-encoded Unicode text is one of the factors we're looking for. We're looking to use XML-based formats for markup, so for marking up a web page for including metadata in documents. XML should be the way that we're carrying and supporting our data. Be clear about how you're getting text out of documents. So if you're converting text, printed analog text to digital, learn a little bit about OCR, investigate re-keying, and make sure that you have metadata that carries those details so that the archivist who follows you understands that this text came from, for instance, an OCR engine that was run in 2013 and produced this text that will help later generations understand what they're working with. For documents, born digital documents like word processing or reports, know your versions, PDF-A whenever possible, and the .docx or xlsx versions whenever possible for office documents. So we'll pause now and take some questions about text, and then we'll move on to our poll for image formats. Danielle, do you want to start us going? Okay. We did have a number of questions that came in as you were going over that, so a couple of them are forward-looking, and we may address them later or in other sessions. So the first one was, will information be given on the problem of computer systems and cells going out of date, i.e. Windows 98, Windows XP, Windows 7? Did you have anything you'd like to share about that? I'm happy to take a swipe at it. This may get touched on in later sessions. There's actually... Let me rephrase your question a little bit to say that there are actually two factors here. One is the operating system itself. So these versions of Windows are different operating systems, and then there are the pieces of application software, the programs that run on that operating system. This is an area where there's sort of a spread of possibilities. So some formats work gracefully between different programs, and the formats we talk about today are in some senses preferred by the preservation community because they have a high degree of independence from both an operating system and a particular program. Your choices, your options when you have an OS and a program mixed up are fairly limited. You really have to either run that program on a virtual computer, what's called emulation, or maintain a working version of another system or convert. Fortunately, I think as we've moved into the network era, the problem of operating systems and proprietary formats is in some ways kind of walked into this sort of turn of the 21st century moment. Going forward, we actually see much more ease and compatibility. So to a certain extent, this problem has solved itself in the Internet, but yeah, there certainly can be a set of factors you need to watch. Next one, Danielle. Okay. This one I'm pretty sure we'll get to in a bit, but what are the best data formats to save data in that will carry over through time and changes in computer systems? Yeah, so we'll actually talk about this sort of over and over today. The two good pieces of advice as we go through through each of these formats we'll talk about the preferred preservation formats. Those aren't always the ones you get for born digital collection, and so, nor are they always the ones that are available to you. So the other piece of advice, if you can't have the optimal standard, run with the herd. So if you choose any of the Microsoft formats, again, no preservation engineer would have created those formats for the purposes of preservation, but the fact that millions of other people have the same problem you do means that you're pretty safe from a risk management perspective. You'll know when a change is coming and when a change does happen. There will be lots of tools to help you migrate files forward. So try to choose the best standards and feeling that, run with the crowd. Okay. We had a series of questions about PDF. Yeah. OCR, PDF-A. I'm going to kind of condense some of these into maybe a couple different questions. One was, we did have some questions about OCR and the challenge of character recognition of alphabets that contain things like French letters, handwriting, other things, someone then asked about re-keying and wondered if you could explain that a bit more. Sure. So OCR is pretty solid for English and Romance languages and progressively less available as you move into other character sets outside of the kind of Latin characters. Google notably has OCRed books and dozens of different character sets in their Google library project. They haven't necessarily made that software available to the rest of us and accuracy declines for non-Latin scripts. So that's still a technological problem that's not resolved. The way re-keying works, there are a variety of ways. I mean, the simplest one is just to have a person sit and type what they see on the screen or on the page. Many people have outsourced this function, often overseas, to shops where two operators type the document kind of blind from each other and then their two transcriptions are compared and this is actually highly effective. What happens even if the operators aren't native language speakers? In fact, sometimes the results are better if they're not. The chances of them both mistyping the same word are infinitesimally small. And for large conversion projects, this is sometimes even more cost effective than doing OCR and cleaning up the OCR transcript. This is also something that doesn't require you to physically send the originals if you have good image capture can be done remotely. So there are a number of vendors that will do re-keying for you and it's just what you would think it would be. So let me also say a little one more thing about PDF. We think of PDF as users as a file format. It's better for having your sort of preservation have on to think of PDF as a metadata format. PDF is a set of instructions for holding together different types of media. And the most common use case for PDF is holding together text and images to create portable documents. But you think of PDF as essentially a metadata wrap or something that says, here is a page. It's page one. It has this text on it and the text is formatted to look like this and in the middle of the text there is an image and the name of this image is firstimage.jpg. PDF will make more sense in terms of how it functions or fails to function as a preservation medium. So PDF and the earlier versions of PDF are pretty simple. They have a limited number of image formats. They support a limited number of things. They do it with text. So their complexity is low and the types of images included in them are limited to a preservable subset. Other versions of PDF include more types of content and content that can have more complex interactions in it. PDF is also a fairly open standard, so manufacturers add their own parts to it. PDF software like the pro versions of Adobe Acrobat, PDF distillers will let you inspect files and understand the parts of them. It's important to understand that PDF is a pretty broad category in computing terms. When we think about it in library archives, usually we're thinking about this fairly simple subset of PDF that's for text and documents, which is by and large the majority of PDF that's out there. Great. Thank you very much. I don't want to hang a step too long with this. People had a lot of questions. We may come back to some of them at the end, I think. Yeah, great. All right, so let's do the imaging poll and then we'll move into talking about digital image data. Yeah, lots of use of imaging. So two main types of images, raster images, which we'll focus on today, and vector images, which we won't talk about a whole lot today. Raster images define a grid of colored dots that we call pixels, and that's the default way of capturing photographic data and the like. Vector graphics, you actually see quite a lot. So most grid part and shapes you would put on a PowerPoint slide are vector graphics, and they're mathematically defined. So vector graphics files have something like draw a circle, fill it with red, give it a black border, and make it this radius. So you can change and alter the size and characteristics of vector graphics fairly freely, which is something you can't do with raster graphics. Vector graphics are almost always born digital. It's pretty laborious to convert analog sources into vector graphic equivalents. Certainly can be done, but it's a fairly rare use case, so we're not going to spend time on it today. For imaging, scanning photographs, taking pictures of art or artifacts, or getting digital photography, the standard format, the one that most people go to most of the time is TIFF, the tagged image file format. This format's been around since the 1980s. It was developed by a company called Aldis and then turned over to Adobe. TIFF has been stable since the 1990s, so this is a format with decades of consistent use behind it and is usually used as the place to store uncompressed image data. JPEG 2000 is also discussed in the preservation community as a new alternative to JPEG. It has a lot to recommend technologically. There's a real shortage of tools to create and work with JPEG 2000 images. TIFF is the safe, effective bet right now. In those files, and your TIFF files or your JPEG 2000 files, there are a couple of things that we're looking for. First and foremost is that they should have uncompressed image data. So TIFF and JPEG 2000 are both capable of storing compressed data. It's important to understand that TIFF doesn't mean uncompressed. TIFF gives you the option of uncompressed. We generally look when we're digitizing images to get 300 or more pixels per inch. That's at reading distances of about a foot or two. That's about the point at which the human eye can't distinguish the individual pixels, the individual dots in an image. And we look for 24-bit color. That means there are eight bits, eight points of data assigned to each of the color channels, red, green, and blue that a computer uses. If you have more than 300 pixels per inch and most people aim for 600, that effectively gives you that ability to zoom in without seeing pixelation in an image. The other thing to support to note is that we want to have color calibrated and profiled with an ICC color profile. This is something that you ask the computer to do for you, not something that you do with your own eye. We're going to look at some slides a little in a second that show why that is. Capture good images. And this is important if you're thinking about how to set up your own workflow, as well as what to look for in a reliable vendors workflow is that equipment should be set up, profiled, there are software profiling suites that are used to do this, and then it should be left alone. There should be little intervention. The master file you create, the preservation file you create from that should be unaltered, so it should have a color profile attached. We'll talk about what's in that, but it shouldn't be something you color correct. Editing, retouching, color correction should always be done on a secondary copy for a particular use, so you might create a second copy for printing a poster or printing a billboard or putting some original format should be uncorrect. You can see right here probably what's wrong with this image, right? It's got a yellow and a blue cast to it. I'm going to show you a colored pane in just a moment, and you'll see there's a black dot in the center of this screen, and I'm just going to ask you to look at that black dot, and there will be a little animation that comes along to help you do that. We're going to look at this for 20 or 30 seconds, and just keep staring in the middle of the screen where the little sun is spinning and blinking in and out, and try to keep your eyes focused in the very middle here. And here's the original image again. You're probably seeing it at first look like it doesn't have a color cast, and as you look at it more, you should see the sides fading back towards that yellow and blue tinge, right? Same image before and after. Your eyes having been exposed to the contrasting color do color correction automatically, and one of the big mistakes people make in imaging and image capture is to try to do that color correction to the digital image, the color of shirt you're wearing, the cloudiness of the sky outside, whether you had your cup of coffee that morning or not, can all affect what your eyes are going to see, let the computer do its job in any number of laboratory studies of people trying to color correct images, and especially professional photographers. The result has just been that they add noise to the file. They don't get any closer to accurate original color. This slide is showing you, provided you have a 24-bit monitor, all of the colors that can be shown in a 24-bit RGB file. There are about 16.7 million of them. That's enough to capture most of the visible spectrum that you can perceive a difference for. This slide shows you the actual visible spectrum as well as your computer can show it. That yellow zone shows what's called the RGB gamut. Those are the colors that computers and digital imaging devices can display. You'll see it doesn't quite overlap with the CMYK gamut, the Cyan, Magenta, Yellow, and Black gamut. That's what printers print. And neither of those cover the entire visual spectrum. I think if you look closely at this, though, you'll see that most of the dramatic color changes from white to fairly pure red, green, and blue are covered by that gamut. So the things that are excluded are different, but they're minimally different. So the RGB gamut covers the most important part of the visual spectrum, and it covers the most important parts of the printed spectrum. What happens in color calibration is essentially that the yellow zone, the gamut that's captured, moves somewhere else on the visible spectrum. And the point of calibration software is to make sure that you're imaging the same part of the visible spectrum every time you take a picture or make a scan. The way this works is that a target is scanned on your computer, displayed on your monitor, and compared in software to a set of known values, and then a sort of crosswalk is created. That's what the color profile contains. The scanner sees red a little bit off from the way it should, and it makes the color correction possible. These things are all covered in greater depth than some of the associated links. What's important for you to know now is that your imaging devices and your vendors' imaging devices give you a window into the visible spectrum, and it's important that you and they have a plan in place to make sure that they always look through the same window at the same portion of the spectrum, right, that they always get color the same way, so that you have consistency from image to image. So again, key specs, and then we'll take a little time for questions. Uncompressed data, 24-bit RGB, a color management plan that usually shows up as an ICC color profile, except in very particular cases, you'll be looking for TIF format for your images, and you want to get 300 or more dots per inch in terms of visual resolution. With that, before we move on to audio, let's take a little time and do some imaging questions. Okay, once again, we have a lot of questions coming in. Some of these you touched on briefly, but others we'll try to go over. The last bit there, you talked about 24-bit color. And a number of people were asking about whether even black and white documents should be scanned as 24-bit color, or whether there are lower bit depths you should be using. Basically, some of those technical details. Sure, so it's important to differentiate between black and white as we think of it, say black and white photography or woodblock printing in a book and black and white as the computer understands it. So for a computer, black and white means one bit. Zero or one, black or white, no exceptions. Printed black and white documents and black and white photos are not black and white. They have a whole range of tonalities in between those in the gray spectrum. So if you scan a printed black and white photo in a computer and scan it as a one-bit file, you'll get a very poor representation of it. There is an 8- and 16-bit gray scale color formatting that gives you a range of gray tonalities, mixing of black and white. I generally don't recommend them. There may be particular use cases where they're valuable, but generally speaking, the amount of data you save, like the maybe 8 or 16 bits versus the amount of data you lose by not capturing full color, isn't worth the trade-off. And for really capturing source documents, generally even what we think of as black and white is color or gray to a certain extent. So in general, 24-bit color is recommended and storage is cheap enough now and devices are robust enough that that's a pretty safe recommendation. You may, certainly if you have a photo archive that's all meticulously cared for, black and white photography, you may be fine with gray-scale imaging. Likewise, if you're just doing text imaging of newspapers so that you can do OCR on them, you may be fine with just gray-scale imaging. But it's a sort of general recommendation. I think 24-bit RGB is the way to go. Great. We also had a series of questions that all related to issues of file compression. So some people were asking about if you have a JPEG, is it okay to just convert it to a TIFF? If you have PDF images, can you just take the pictures out of them? What are your recommendations on that? So let me start with the PDF first and say I don't think I would bother. So the PDF picture itself probably is encoded as a JPEG or a TIFF. So PDF supports both of those formats. Usually better to leave things in their native format unless their native format is highly problematic. In terms of compressed and uncompressed, there are lots of different ways to do compression. JPEG is open and well documented. So if you get a born digital JPEG photography collection, it doesn't present a particular problem for you. The chances of not having a JPEG decoder down the road are very low. I would give you a different answer for a proprietary compressed format. Something like photo CD would be an example where you would probably want to save all of those photos into a different format. An uncompressed TIFF file would be the natural choice so that they're not tied to a particular proprietary format. So with born digital resources, we take what we're given. So a lot of photography now comes in as JPEG, even for the high resolution JPEG. So that's certainly your starting place for a format. Great. And that brings up another question some people had, which is that they're getting digital images and it sounds like they may be getting some in DNG or other raw format. The DNG format I think has a real strong case to be made for it. DNG is essentially a wrapper for raw camera data. So raw data, for those not familiar with it, is just a readout of all of the voltages that are seen by the camera's image sensor. So when you shine light at the image sensor in a camera, it reads those photons as a particular voltage and then the camera translates those into color, or the software translates those into color. Each manufacturer has their own proprietary way of managing that raw data. Digital negative is something that Adobe developed to create a platform independent way of sharing raw data. So the digital negative format, the DNG format, I think is a smart way to preserve born digital photography. The raw format has a lot of risks built into it. If Canon changes its software or goes out of business, all of your raw files go out of business along with it. So the DNG is a great way to get born digital photography. I don't know that I would recommend it quite yet for reformatting formats. I haven't encountered a lot of people who are, for instance, scanning an image and saving it as a DNG instead of saving it as a diff. You could make a good argument for doing that, but it hasn't become the consensus of the field yet. But digital negative is a viable format. Very good. I think I would recommend taking it just a moment. This would be your last slide. You don't have to jump there. But I'm going to mention that we do have some resources linked on the course page, and some of these will address specific questions people are having about different formats and scanning standards. So just everyone, if you had a chance to look through some of these, and Jacob, I don't know if you want to say anything about those now. Yeah, so you'll see a number of resources at the end of my presentation and on the website that talk about these different formats and also sort of walk you through how to look for a vendor, how to sort of go through the details of setup. We could have an eight-hour webinar on how to set up and color profile a scanner. So rather than do that, we in this series are trying to give you the smart questions to ask and then point you to the resources that will help you do the step-by-step details of that process. Great. So again, I think we'll just hold a few of these questions and perhaps we'll hope that we have some time to get back to some of it. But let's go on and talk about audio. Sounds very good. All right, so you can guess the poll question. Have you digitized audio materials? Okay. Yeah, and a little different makeup here. Interesting to me that or more have or plan to digitize audio materials, I think even a couple of years ago we wouldn't have seen those numbers in a series like this. It's becoming a more heavily used format for libraries and archives. So we're going to start again with sort of the magic words and then get a little bit into why those things matter and how we go about creating them. So the tongue twister for the day for me and all of you is uncompressed postcode modulation, which I can't say twice, even very slowly, which is and consequently we have an acronym for uncompressed PCM. Like imaging, we're looking for uncompressed data, the raw capture of a device, and this uncompressed PCM, this method of capturing sound data actually dates all the way back to telegraph codes and early telephonic systems. So it predates the computing era. The math and the thinking behind this is actually a very long-lived way of representing audio data. The format that is generally used for audio preservation is the broadcast wave file B wave, which is just the wave file that you may have encountered elsewhere in your life working with computers with a metadata header attached to it. And I know some audio engineers who give that the pithy name of catastrophe metadata that's fairly scant. It usually says something to the effect of the title of this file is this, its author is that, and it's this long. It's enough that should you have a disaster and lose your other metadata, you can at least determine what is in these broadcast wave files. Aside from that little metadata header, it's just a wave file, an uncompressed audio stream. And just like images, there's resolution and bit depth to pay attention to an audio sample. The resolution, which is the number of times the audio is sampled in any given second, the recommendation is at least 44.1 kilohertz. That's 44.1 thousand times per second, 44,100 times per second. And that's the quality that's recorded on CDs. Most audio engineers who are working in preservation prefer 96 kilohertz, 96,000 samples every second. The bit depth that's recommended is at least 16 bits, that's CD quality, and preferably 24 bits. So you'll see these paired up, so a 44 slash 16 recording is CD quality audio. A 96-24 recording is the sort of preferred preservation audio quality. These two images are intended to start giving you a sense of why these things matter. So there's something in audio engineering called quantization, where you're trying to make sure that your sampling rate catches all the different parts of the signal. And so you can see the blue line here represents the original signal, the actual audio signal. And the red jagged line shows where samples are taken. And this shows what happens with a quantization error. So because of the points at which samples are taken, if you look at the image below, you see the result of some clipping. So the peak of that first upswing is clipped off, and the trough at 0.5 is lopped away. And you end up missing parts of that signal. One of the ways to avoid that is to sample more frequently and also to sample with greater bit depth. So the more frequent sampling will catch more of the changes in the audio file. A wider sampling top to bottom from 16 to 24 bit lets you catch more gradations of the file. So you'll see in this next slide where, particularly in this example, sound was lopped off. 4416 is almost the limit of human hearing. 9624 captures anything you and I could hear, as well as some things that dogs and birds would enjoy listening to. 9624 is intended really to give us some headroom to make sure we've missed none of the sample for most of the audio we work with. You may hear from time to time things like, oh, well it's just a voice recording. You don't need as much audio fidelity in that. I know many sound engineers who would strenuously argue that the opposite is true, that it's because we have evolved to listen to the human voice very carefully. You need much more audio fidelity than you might for a symphony orchestra recording. Something we did not evolve to listen to. 9624, again and again, has had very strong cases made for it by the preservation engineering community. Also important to realize that in most recordings we have several channels. So mono audio just has one channel. Most of the audio we listen to nowadays and I have a suspicion this could be true for even this webinar is stereo. The audio is broken into a right and left channel. So each one of those channels would be 96kHz, 24 bits per second. This is why audio files start to get fairly large compared to text and image files. It's also increasingly common, especially for motion picture soundtracks, to have 5.1, which is actually six channels, center, front, right and left stereo, a rear, right and left, and a lower bass track, six channels of audio. In addition to those waveforms, the uncompressed PCM waveforms in the audio file, there may be some metadata in broadcast wave format, especially relatively small compared to the audio. And then again, CD audio is 44.1. Most digital preservation engineers favor 96kHz to get more accurate reproduction and help avoid errors. This bit depth, the CD audio 16 bit provides 65,536 levels of amplitude. That's the amplitude from zero, from the bottom of the graph to the top of it. That lets you range from zero to 96dB, so that's essentially measuring the intensity of the sound. Rock concerts get up into the hundreds of depth, 100, 10, 120dB, and the reason you can still hear rock music on a CD is because very rarely do they ever get to zero. So like that RGB gamut, audio engineer places the sampling on the actual heard part. So very few CDs contain zero decibels, which is something close to absolute zero when things just don't vibrate at all. 24-bit audio has a theoretical maximum of 16.7 million levels from zero to 144 decibels. So that's from absolute zero, no vibration at all, well beyond the limits of human tolerance at the top. It's actually more audio data than current circuits allow us to transmit. Most circuits, even though the best ones, allow maybe 100, 120 decibels of amplitude to be recorded. That said, some people are recording for purposes beyond human hearing, so anyone who's tuning in from the Cornell Ornithology Lab or is working on whale song, there's some information beyond this presentation you may need to acquire. Capturing audio, really in capturing anything, there are sort of three components. One is the source, right? So for audio reformatting, this is usually magnetic tape. It could be LP or quite often it's a microphone. And that compares or sort of analogous to the photo document or the book that you would be scanning in an imaging workflow. The transformation happens with what's called a digital audio converter, a DAC. And you'll see two types of these, analog to digital converters, ADCs, and digital to analog converters, DACs. So this DAC, the digital audio converter, is what actually determines the bit depth and resolution and basic quality of your capture. So a better digital audio converter adds less noise, samples more accurately, and limits the amount of amplitude and frequency of sampling that you can manage. This is where you want to quiz your vendors on what kind of DACs are you using, why did you choose them, how much actual data that they capture. This is also where you, as a person who might be setting up an audio reformatting workflow, want to spend a lot of time comparing manufacturers and reading specifications, this is the equivalent of the scanner or camera in your digital imaging workflows. And then at the tail end of that is some sort of audio mastering or editing software. Similar to the image editing software you might use Adobe Photoshop or what have you, you'll need some sort of software to manage and create audio files after they're captured from that DAC. So again, just to review our key specs, the broadcast wave file, which is a wave file with a metadata header, is the best format for storing digital audio at present. The wave audio has an uncompressed PCM audio channel, one per channel of the file, and that's sort of our universal format for uncompressed audio. We want to get resolution of at least 44.1 kHz, preferably more, 96 kHz, and a bit depth of at least 16 bits, but preferably 24. So with that, we'll pause before we talk about some other data formats and take questions. Okay, well, we had some more really good questions coming in. So some of them were technical questions and some of them were more general sorts of questions about audio materials. So one of the questions that I think we should start with was just a jargon question. Is there a simple definition of sampling? Sampling is actually identical to what you do when you scan an image, right? So for every second of the audio file, you record the amplitude of that wave, right? So audio file comes to you as a waveform, a series of vibrations, and each one of those has an amplitude, essentially a height. It's zero decibels, or it's 96 decibels tall, or it's 10 decibels tall, and you sample by recording those heights of the waveform a number of times per second. That's the frequency aspect. And so a 44.1 CD recording has 44,100 samples in a given second. It says the waveform's 17, now 18, now 19, now 13, now 12, now 11, now 9, now 23 rapidly, 44,000 times per second. And so what a digital audio converter does is it just measures the intensity of that wave several thousand times per second and stores that as a digital file. There was a question about a wave versus B wave versus broadcast wave. And obviously the file extensions are different, but otherwise what are the differences? Yeah, so you can think of, it's not terribly inaccurate to say that B wave is a wrapper for wave files. So they both have the same essential data inside them, these uncompressed PCM audio samples. What B wave does is add some metadata to that that tells you useful things, both descriptively, title of the track, for instance, as well as technically, that the track is X minutes long or contains this many megabytes of data. And so the broadcast part of that is metadata that's added to the wave file. And it comes from broadcasting. It's the sort of thing that as a radio station you might want to know, title, author, airtime. And so B wave adds some of those key pieces of metadata to the file. Otherwise the sound samples are identical between the two. Okay. And the general extension for broadcast wave, by the way, is BWS. Yeah, sorry, I think I... Yeah. My response to one of the questions was wrong. One question that came up, people are asking about how to convert different types of audio sources to a more preservation stable. Yeah. From 44 kHz to 96 or 16 bits to 25, what are your general thoughts on that? This is similar in some ways to the question about what if I get a JPEG file as my, you know, if it's born digital, up converting does you know good. So if you take a CD and convert it to 9624 you just make it bigger. There's no way to add back that data. And this is one of the reasons that preservation engineers stress doing a very high resolution capture. You can go from 9624 down to 4416 to produce a smaller and still listenable file for use in the reading room. You can't go the other way. Now you may get data into your collection that's recorded in the MP3 format or as a CD. You just keep what you've got. You can't add data back in. What happens with an MP3 file is like what happens with the JPEG file. So there's an algorithm, a way of doing math over that compressed data that expands it to guess at what the full data ought to look like. And when you get born digital files you keep what you're given and if it's a proprietary audio format you might want to convert it to a WAV file. But there's nothing gained by converting it to a larger WAV file than it is to begin with. Great. I think we'll hold the rest of the questions. There are a few links that have been posted and various other things. They're great. But we want to make sure that Jacob can get through all of his material today. Great. So we'll touch just briefly on video and moving images and some other things. So you can guess the poll question. Have you digitized the video or moving image material? Okay. So similar to audio, a little more even across the board. We're actually talking about two different sources even though we end up in the same place in the digital world. So motion pictures are moving pictures a series of optical image frames with a soundtrack that itself is often recorded optically on the film. Video is a series of magnetically recorded signals that are waveforms similar to audio images and sound. Video has a specific resolution for each frame that provides a fixed number of scan lines. So if you start getting involved in the video you'll start seeing things like 720 by 480i 60 which means 486 lines from top to bottom. That's the CCAM standard that we're used to here in North America with 720 pixels across. The I-60 means interlaced which we'll touch on in a moment. And notably there are six lines for metadata in that format and it's graphical metadata not textual metadata. So in those six lines there are certain dot patterns that tell the video engineer something about the video file. This is in one sense tremendously clever and to us now having to reformat and tremendously irritating. This is what actually exists in a magnetic video file. So it's a series of wave files the first part horizontal blanking and the sync tip helps make sure that the video frames are created at the right rate. You'll see a little gray box that has a blow up down below and that's the color burst it's sort of a very low resolution color wash that is then given definition by the active video portion. That active video is essentially like a black and white rendering of the image and when the color is overlaid on that black and white rendering you get a crisp color image. This is similar to the way that printing is done when you do color lithography actually. So this is just a series of waves to depict high and low and different hues and saturation of color. So video magnetic video is a wave and when we digitize it we turn it into a raster frame. We turn it into a picture. When you paint video on the screen in the analog world you paint horizontally across the screen from left to right and by sending a beam of electrons very quickly faster than the eye can perceive it. You can actually freeze that final image when you digitize it. So video conversion is a dramatic change from video to digital. For motion picture to digital it's essentially going from one way of representing a still image to another way of representing a still image. In both areas standards and practices are still developing. Uncompressed video data digital video data is desirable for all the useful reasons. This is something that really bumps up against the problem of storage costs. For text and images storage has become fairly inexpensive. It's not as significant an obstacle anymore. For sound it can be an obstacle. Certainly sound can take up a lot of data. For video and moving image it almost certainly is an obstacle because of the cost of storage and the size of each files. Compression is pretty normal in video but may cause preservation problems. This is an area where the math is just not quite worked itself out yet. Storage isn't quite cheap enough. Files are still fairly big and formats are not at a place where we feel they've stabilized. The current sort of safe bet is uncompressed.avi but there are plenty of competing formats. Motion JPEG 2000 and MPEG 21 the quicktime.mov format can contain uncompressed video. There are lots of competing formats in the video space. One thing that is positive is that this H.264 codec has become the standard way for delivering video to end users. The H.264 is a lossy compressed format suitable for web delivery but part of the good news here is that even if you have a massive video archive in MPEG 21 it ends up being a dead end as a preservation master format. If you've created H.264 deliverables even if you have to deal with reformatting your masters you won't have to also recreate new deliverable files from those. The H.264 looks like it's here to stay. Unfortunately the bottom line here is that at the end of the day you pick a video format that makes sense for you and you expect that you have to do a migration. It could be 5-10 years from now. So H.264 is sort of our standard delivery format. It's the native video format in the HTML5 specification and it's awesome the format that's used in flash video. There are several more or less proprietary options for getting video to people, QuickTime, Real, Windows Media. I would not recommend any of them. I think if you're doing video delivery now do H.264, that's the safest bet. So again you can certainly pick a proprietary option but a migration is likely in your future. I also just briefly before we do final questions want to nod towards data and interactivity. When you're dealing with data I think some useful questions are to decide if you are trying to store fixed points in time. That is are you trying to store the contents of a database or a data collection at a particular point in time in which case you can essentially render things out to the equivalent of an Excel file or an XML file or do you need to maintain an active system? Are you planning to keep collecting and sharing data at a living system in which case you have a systems maintenance question to answer? Sometimes you may be doing both. Some examples of how this affects you probably all of you would be email and social media. Email is a widely known but loosely defined set of standards and the way people interact with email is tightly coupled to the program they use. You can read the same email message in Outlook and Thunderbird and Apple Mail and on your cell phone and see a different version of it and do different things with it in each instance even though the underlying data will be the same. Likewise social media things like Facebook and Twitter which are what some people call closed but free so it doesn't cost you anything to use them but there's no provision to move data to other systems either. This is as you can imagine becoming a very important issue in archiving and lots of research discussion and hand wringing is going into this space. I've included a couple links for where to learn more and where to keep track of issues attached to this. Like video this is an area where if you can wait a little while before getting involved it behooves you to do so. If you have to be involved right now you build a case for the work you're doing and expect that the winds of technology may blow in a different direction. So as we mentioned here's a listing of online resources this appears on the website as well and I also put a page on my own personal website just jacobnidal.com slash 342 that's a persistent URL. There's a contact form you can use to send me a message and if there are kind of big updates or things that are needful to post we can add them there. Most of the content we'll live on the Connecting to Collections website though including this online list. So with that let's do wrap up questions. Okay so we are getting really close to the end of today's session and don't have a lot of time for things. Sorry we're just not going to be able to get to. Issues about codecs and some of the problems you may run into with video that you just can't read or playback. Yeah so again we touched lightly on video video each video codec is something like what we talked about with UTF-8 right so there's video the idea and there's codec the way of actually writing that data to a disc and so for video oftentimes the top level file format that you see .WMV or .MOV or .AVI is a wrapper and inside that wrapper it's something like here's some video data and you should decode this video data using the H.264 codec to read all these pieces a video wrapper to hold all of the pieces together the video and audio data itself as well as these codecs to incorporate them. So if you think about what goes into a motion picture there's a series of images that you watch as frames that show the visual part of the motion picture there are between one and six audio files there may be subtitles and captions in a DVD there may be alternate camera angles and so what the top level video format does like the PDF format for example or as by way of comparison is to show how all those things relate to one another and then for each one of those it says render the video using this set of tools, render the audio using this other set of tools display the closed captioning data in this way and so the video files are themselves sort of very complex multimedia objects. Well thank you very much for all of that information we're getting things and congratulations and whatnot into chat box I want to turn it back over to Kristen so she can give you some final instructions Thanks so much you guys I think that was really useful and helpful I know it's a lot to cover in a very short amount of time so I think everyone can find your links really helpful people have been sharing great links in the chat and we'll get those up and we just want to just quickly give you the link to the homework assignment also link to through the course page and Jake had just asked a question about first of all what are the factors that you would need to consider based on there's some sustainability factors he talked a lot about how different formats are sort of been determined to be more stable and of course there are other things that are still to be determined but there's a great Library of Congress website you'll be asked to look at and to list three potential factors you might need to consider when choosing a format for depending on what items you have in your institution and then a little chance for you to express your own opinion this is as you know an evolving science and you know there's lots of decisions that you have to make so we've asked for your opinion on something so no wrong answer but just interested to hear what you have to say and we've asked for you to let us know if you're watching with colleagues today if you watch by yourself don't worry we saw when you logged in and you will be taken as present but if you watch with colleagues let us know so we can count down we had over 400 participants today which is outstanding and we really thank everyone for their time we're just at 3.30 Danielle did you want to ask any more outstanding questions or we'll try to maybe get them addressed in some other manner Jake I don't know can you stick on for a little bit longer? Yeah I've got a couple of minutes say to our New York contingent we can continue the chat over coffee Yeah I've got a few more minutes if we want to take a couple more questions Well there were two questions about material types that we didn't cover today and one of the specific types of things was CAD data Yeah you don't want to have those CAD is an emerging problem for the digital preservation community obviously it's critically important we walk on, sit on drive, fly in every day has a CAD file somewhere in its past CAD file in principle is preservable so CAD file says things like there are two lines that connect at a certain point and they go off in these different directions and they should be made of magnesium The issue we are grappling with is that most CAD formats are proprietary so they're produced by a certain manufacturer as part of their software package and they're bound pretty tightly by IP restrictions So this is an area where I think you need to kind of be alert and run with the herd Anything produced by the sort of auto CAD, auto desk series of families has a large user base and a lot of investment in it you're likely to know quickly that these applications are coming and have plenty of time and tools to support you in migration but you are going to be tightly coupled to a proprietary manufacturer I don't see or know that there are really compelling kind of free and open alternatives at this point but this is something that there's a lot of discussion about in digital preservation communities I think in some ways the best answer right now is to stick with the de facto standards of small market low support CAD programs that you're using well supported programs that have a big user base Great Well I think probably our best bet for some of the remaining questions that we didn't get to is to go ahead and just post something later on and just type up some of the responses and post that so please check back on the website and we'll try to get those answers but thank you all Thank you so much Danielle and to Jacob really appreciate this great content we had today and as Danielle said keep checking that website you'll be getting an email from us later today with a recording of today's webinars and link to the website and other reminders so we look forward to seeing everybody back here next Tuesday 2pm eastern time and that is April the 9th we'll be hearing from Danielle about metadata which came up a little today so I know it'll be great thanks everyone and have a great rest of the day I think the recording has paused the great spot