 Please go ahead and begin whenever you're ready. Jessica. Wonderful. Thank you, Mike. And hello, everyone. I'm Jessica with Heritage Preservation, and we're so glad that you're all joining us here today. Before we get started, let me just give a quick introduction to the community, and then we'll move on from there. The Connecting to Collections online community was originally created in cooperation with the American Association for State and Local History and with funding from the Institute of Museum and Library Services. Community and webinars are moderated by Heritage Preservation, and LearningKinds is kind enough to produce both our website and webinars. The goal of the online community has always been to help smaller museums, libraries, archives, and historical societies quickly locate reliable preservation resources and network with colleagues. To help you do that, we have compiled an extensive list of online resources that are broken up by topic on the online community. In addition, we also host free drop-in webinars, like the one today, on topics that we hope you'll find useful. A recording of all of our webinars, including this one, can be found under the webinar archives. And of course, if you're interested in continuing the discussion, you're welcome to sign up as a member of the community and post questions to the discussion board. Beginning next month, the American Institute for Conservation, known as AIC, will take the lead on organizing the C2C community. The community will be renamed to Connecting to Collections Care, our C2C Care for short, to more accurately reflect its focus. C2C Care will continue to offer educational events, including monthly webinars on collections care topics, while encouraging discussion among colleagues on the discussion board. The community augments AIC's growing digital presence and provides a forum for AIC to reach the broader collections care community. Be sure to check out the C2C homepage for more information about this transition. Today, I'm pleased to welcome Linda Schmitzferi. Linda is Electronic Records Archivist at Smithsonian Institution Archives. She is responsible for the preservation and access of a variety of institutional born digital records, which include texts, images, audio, video, websites, and social media accounts, email accounts, and databases. She is a Steering Committee member of the Society of American Archivist Electronic Records section and teaches a course on email preservation for SAA. She also has contributed to the Federal Agency's Digitization Guidelines Initiative on Born Digital Video. She has a Master's Degree in History from the University of Illinois at Springfield. Linda, thank you so much for joining us. And I'm just going to go ahead and pull over your PowerPoint and just remind folks that if they have questions during the presentation, feel free to type them up in the chat window and we'll make sure that we get to them by the end of the hour. Linda, let me take it from here. Great. Hello, everyone. And thank you for the introduction, Jessica. And thank you, everyone, for joining me today. The title I have today is a little cheeky, but I think we all know digital assets need regular monitoring and grooming. We just can't store them somewhere and forget about them. So today I'm discussing digital assets and issues with migration, preservation, and access. This involves both digitized objects, that is materials that are in analog form and a digital version is created from it, as well as born digital materials, which started out in electronic form. And you may have these files from a piece of removable media like a CD, a shared drive, a floppy disk, or an email attachment. Reformatting is an interesting term. I've heard it used only with analog materials, but have been seen and applied more often now to foreign digital as well. And my work here at the Smithsonian involves foreign digital records 90% of the time. Since this presentation is a high-level overview of digital assets, there probably will be some topics that will not be covered today. But I do have my email address at the end in case there are questions I don't get to address. Any products that are mentioned today by me are in the description. Some questions for you to think about during the presentation. So before you can even think about formats, you need to know what you have in your collection. And have you done an inventory? As we know, an inventory is a great foundation. The information should include the media type. Is it a CD, a DVD, floppy disk, zip, or jazz disk, data tapes, and so on? Are there any notes on the media or on its enclosure? If you can't tell what the media is, obsoletemedia.org is a website from the Museum of Obsolete Media that can be very helpful in this regard. It has pictures of what the various carriers look like and what they are, including data, video, and audio. The University of Illinois also is working on the preservation self-assessment program or PSAP. Its website also has a format ID guide with images of wax cylinders, suprate film, and others. And they actually did a connecting to collections webinar earlier this month that you can refer to. If these are digital scans of materials, where are they stored? When were they digitized? Do you consider them to still be accessible and optimal, such as having the quality you want in terms of resolution? You can either record this information in a database, a spreadsheet, take a picture of the media, and keep those images organized. The image here is of our collection management system. And it notes the accession name, box number, if it was in one, as well as the folder name. We also provide a description of the media and what the label indicates, as well as any volume labels. And we'll also note the preliminary assumption of what the files are based on file extension. And you'll note it from the files on the media. But keep in mind, files might not be what they say they are. And I'll get into that a little more later. Even the manufacturer of the media can be useful information, especially with obsolete formats. So again, are these files accessible? Do you have the equipment? Have they been transferred or copied over already? And where are they? Do you have software to read the files themselves in the present format? And do you think those files are anywhere else at your organization or outside of it? And do you have backed up copies of those files? When you do copy files off of media, take precautions to write protect when possible. You can do this easily with sloppy disc apps by setting the tabs to open. There are other software and hardware products that can do this with other types of media. We're not getting into digital forensics today, but there will be a reference at the end of the presentation a little bit about the BitCurator project. And a word of caution. Don't think that since files are on something recent like a USB thumb drive that they're safe. They have a limited number of right array cycles, and we've all experienced media failure and cannot expect them to last forever. Our best practices at the Smithsonian Archives are to transfer files off media as soon as possible if possible. Research by the Library of Congress, which has received some good press coverage this year, as well as other organizations, have shown that lifespans can vary greatly with CDs and DVDs, even when they are manufactured at the same time. And studies have shown life expectancies anywhere from two to 25 years. I think the writing is on the wall regarding CDs and DVDs as streaming and cloud storage become more prevalent. Majorities of computers do not come and come with optical drives anymore. And there's also the issue of mishandling and poor storage. An ink can be written on the media, which can damage it as well. And one of the worst offenders is adhesive labels. As you can see here, it can bubble and actually remove data off the media if the label does indeed come off. Recently, I had a colleague come to me with a CD, with an audio file on it, as recent as 2010, but someone had put masking tape on the CD and removed it, which then damaged the data. Data on CDs and DVDs tend to be on the closest layer to the top. But I was able to use some software to retrieve about 94% of the audio for them. So that may have not been possible previously with what was available to us. Before ingesting your files, I suggest that you inspect the media, see what kind of shape it's in, and do a virus scan. If you're working with files that need to be transferred or copied, avoid just copying and dragging to a folder. On a PC, this method can cause the date created to change to the current date. And you also risk not capturing everything. In order to avoid this, bag it from the Library of Congress in DIP, the National Digital Information Infrastructure and Preservation Program tool is very helpful. Teracopy and ARSync for the Mac also are good tools for transfer, and they're all free. Here we see the Baggot GUI, or user interface known as Baggot. Baggot also works with the command line. And what is good about this tool is that it runs a check of your original files and then validates the copy against the original to let you know that the copy really is the same. It creates a manifest that lists the files and the MD5 that is generated for each file. In addition to having a folder called data with the files that were transferred, Baggot will also output metadata files, including transfer size and transfer date, as well as some other details. So why is it good to have a checksum hash figure? Hashes are unique for every file unless it is the identical copy of it. This is a better method of comparing files than relying off of file names, dates, and sizes, as one file could be the draft and another could be the final version with the same name. It's useful to have this data upon ingest to compare to what you received as well as being able to compare over time, such as weekly or monthly, to make sure files are the same and have not suffered corruption. Fixity is a free tool from AV Preserve that was released this year. Fixity scans a folder or directory, creating a manifest of the files, including their file paths and their hashes. Then the tool can be run at regular intervals to recheck those files based off the original generated hash and it can use either MD5 or SHA-256. It notes if there are new, missing, moved, or renamed changed files and it can also email the report to the user. Here you see when it first runs on a directory and it's manifest, the report also indicates the file status. Then after the file is changed and Fixity runs again, it notes the change because the hash has changed on that file, no matter how small the change might be. If you're using something like a digital asset management system or something similar in your organization, it probably is set to run FixEechX and it's something that you should confirm with your organization. So once you have a handle on how much you have in digital form, storage needs to be considered for the present and future. Do you have enough? Are you expecting an increase in digital assets? I think all of us can answer yes to this. You also need to consider the preservation master and the access copy. Do you want to store it in-house or in the cloud? In both cases, it's good to know when backups are done, which is different from having extra copies. And what is the disaster recovery plan? If you do go the cloud route, what can the provider do at your assets? Are you able to provide and access those files yourself 24-7? Can your users do as well? You need to read the terms carefully. Some organizations are relying upon CDs and DVDs for their storage due to budget. I suggest investing in an external drive as possible that you can back up as bad as all you can forward. I encourage you not to have just one copy and migration to new media needs to be an important part of your plan, as well as having a copy off-site if possible. Here at the Smithsonian Archives, we rely upon backup servers and LTO tapes. There are software tools available that can help you drill down your file types as well. You might have thousands of image files that take up 500 gigs of space while someone might have the same number of files but is video. And these are going to take up much more space because of what they are. So what are the technical resources that you have? Do you have a dedicated IT staff or is it just you? You might need to invest in some training. So is it time to do something with these files, whether you're just ingesting them off the media for the first time or scan them a few years ago and need to make sure they remain accessible? You'll have better luck accessing a PDF file versus a WordStar file. So some of the approaches, migration are converting the original format to something else. This typically is done when obsolescence is an issue. Normalization is putting it into another format and typically a smaller set of preservation formats this is done on ingest. And then we also have emulation where there's attempts to create the environment that the file originally was in. Some commercial products like Preservica can monitor your files for obsolescence and plan for migration. So the files themselves, we have everything here from text to images to websites to AV files and long-term accessibility is important. Every four to five years, we sample our accessions that have been preserved to make sure they are still accessible. Do you have a digital preservation policy in place? You also want to consider if you want a disk image that is all the files bit for bit off the media or if you just want some of the files. And this depends on what your collecting focus is. If you're dealing with manuscripts from an author, you probably want a disk image. A researcher might be interested in seeing some of those early drafts. And again, transfer with care with a tool that can validate what you've copied. But what if you don't know what you have? Format tools such as Joe, both version one and two, Android can help. Both of these tools are well-established and commonly used with digital curation and preservation. Joe is the JSTOR Harvard Object Validation Environment and it can identify and validate 12 open formats that include JPEG, PDF, TIFF, and WAVE. Digital files are checked for being well-formed when analyzed against the requirements of the format and valid. It says it's PDF, is it? Droid is a software tool developed by the National Archives in the UK and works with more file formats. Android uses internal signatures to identify and report the specific file format and version of the digital files. And it relies on the pronoun registry, which is updated regularly. And there are other tools as well, like FIDO from OpenPlanet and FITS from Harvard, which is not to be confused with the FITS image format used in astronomy and by the Vatican with some of its digitized images. So here we see the Droid tool and its user interface and this might be a little hard to read, but it's a list of 20-year-old files that either have odd extensions that are not allowed these days, like .rr and .rec, or they have no extension at all. And Droid is telling me that they are word perfect for DOS based on the evaluation of the internal signature, which is the byte sequence common to a particular file format. Droid also works from the command line and I'll talk a little more later about that as well because command line can be intimidating to some. With both tools, you can export details about the file, such as name, hash, and mime type. Jove does produce more metadata, but do keep in mind that these tools are not perfect and sometimes might miss something or cannot determine what a format is. Okay, here we're looking at an XY write file, which is a very old word processing format. There's also viewer software that's available that can help you figure out what something might be. So be aware of older word processing files from the 1980s and 90s that lack extensions or have ones like .let for letter or .mem for memo. There are even some .doc files that are actually word perfect files. So when possible, you wanna go with formats that are well-established, widely used, and open, non-proprietary. Encourage your depositors to do this if you can. Files also should not be encrypted or password protected. A good reference about digital formats is the Library of Congress's Sustainability of Digital Formats and here you can see some information about the various versions of JPEG and JPEG 2000. Once you do have the files on something more stable like a backed up server or machine, you can start considering the formats for preservation and access and what you want to do with them. We're assuming the files we're talking about today have been appraised for long-term or permanent retention and first of all, you always wanna work with a copy of the file that you might want to migrate. Will you do the conversion or will you send the material out? You also should keep that original file, which sometimes includes the media in case you need to go back to it or better tools come along that allow for better conversions. And be sure to document the work you do, whether that be in a processing note, a database or metadata standard. So what if you don't know what you have after trying various format tools? Consider what it might be. Are there any clues on the media itself? Is it something you actually want to keep for future access? And bit-level preservation that is keeping it as is might be all that you can do at this time. So WordPress students' files or other text-type files are usually straightforward, but since most are in proprietary software programs like Microsoft Word, WordPerfect and others, there can be issues. Quite often the files will open another program, but the fonts and layouts may not be the same. Even different versions of the same software can make files look different. Microsoft Word 2007 versus Microsoft Word 2010. And consider if this is an issue. If it's an invoice with a regular typeface, perhaps it's not a problem. But if it's correspondence from an artist who uses this particular font, it probably is. So some of the options to consider is creating a PDF or a PDF-A, which is PDF archival version of the file. OpenOffice is open source software that can read many Office Suite files. Be aware that it might look different, though, from the original. Leave it as it is if it still opens in the original software. Our practice here is to create a PDF-A or PDF since the WordPress licensing software is proprietary. But again, PDF-As are tricky to create and need to be 100% self-contained. The requirements for PDF-A1 are more strict than a regular PDF. Audio and video content is not allowed. No encryption. We're still using PDF-A1 here, even though version two and three are also options. So here we see a press release for a Lewis and Clark exhibition. The font looks straightforward and it was converted to PDF-A. If you're using something like Acrobat to make PDF-A, you'll see a message at the top that indicates that you are viewing it in PDF-A. But this does not mean that it is a PDF-A file, though, and you need to verify it. If you look at the upper left-hand corner, you see that it has failed verification for conformance. So one of the causes of this is due to embedded fonts. Dominion font used in the Smithsonian Sunburst has rights associated with it and it cannot be embedded, which is a requirement for PDF-A. So nevertheless, the file is perfectly viewable in PDF. There are batch tools that can help with this type of conversion and migration work. Also, if you plan to post these types of documents, a PDF version of a Word doc is more accessible to the public. If it's just a plain text file, we'll leave it as is and make a copy of it. And if you have just one word processing file that you're uncertain about, you can actually take a copy of it and open it in something like Notepad. If it's WordPerfect, there will be a WPC near the top, which stands for WordPerfect Corporation, which also tells us that this is an old file because they're now owned by Corel. And if it's MSWord, that will be indicated toward the bottom of the file indicating what version of Microsoft Word it is. Spreadsheets can be easy to create as PDF when they are simple worksheets. It becomes more complicated when there are macros involved. And again, it's always good to keep that file in its original format. Options again, creating a PDF or PDF version of a file. Remember that there might be additional sheets in the document or page size might need adjusting to get desired results. So again, you wanna work with a copy. And here we see a file in its original software of Excel. And this is what it looks like an open office. So some other options, leaving it as is if it still opens in the original software. And CSV is a format to also consider since it opens in many software programs, but you will have no macro functionality. Images tend to be JPEGs, TIFFs, and other well-established types. And in many cases, you can still open them and just need to monitor them over time. For digital scans of text and images, our preservation format is uncompressed TIFF with 6,000 pixels on the long axis of the image. And for color and black and white images, a 24-bit RGB setting is used. And with images from microfilm, we actually use a lower resolution of 300 PPI grayscale. And for the web, we're posting JPEGs with a lower resolution. So if you digitize some images at 300 PPI 10 years ago because of the limitation in equipment and standards then, you might want to consider if you want to review them at the higher resolution now. With foreign digital images, we have no control with the resolution and the format is and nothing is gained by creating a TIFF from a JPEG. Note that you should keep in mind when you do a save as on the JPEG, it will recompress and you're going to lose more quality. So just copy those files, do not do a save as on them. So we have a digital image and the photographer actually did embed some metadata with it. So it is worth checking your files for this. If you are creating digitized images yourself, consider embedding metadata since it lives with the image. IPTC, International Press Telecommunications Council metadata includes headline, description, copyright, creator and keywords. I mentioned earlier another reason for having that original source as good is that better software is developed over time. And this was the case with Kodak Photo CD file. Developed film was scanned onto CDs that contained up to 100 images and saved as the proprietary PCD format rather than JPEG or TIFF. Kodak no longer supports the product and we have about a thousand of these images in our collections and many museums at the Smithsonian do as well. Some of them were migrated previously into TIFF preservation files, but we were not capturing the entire file with the software we were using. The file size was set to a smaller one during conversion from its original size on the CD. And other software that could convert the PCD files previously discontinued the plugin that was needed and software upgrade. Using the newer software, our latest conversion to TIFF files has resulted in full size files with higher resolution and we actually got metadata about the film and scanner that was not present previously. So all our collections with PCD files have been migrated to these better versions. I also wanna mention that the Smithsonian has a good document about still image embedded metadata that is included in the references that we'll discuss later on. So some other common image formats you might encounter are JPEG 2000, Loss Bliss and Lossy, PNG, Digital Negative DNG, Bitmap BMP, and then GIF GIF. So with graphics and presentations, we will convert to TIFF or PDF with Quark and publisher files usually. Presentations like PowerPoint usually can be converted into PDF, but be aware that sometimes notes are included with presentation and might need to be captured with different settings. And again, you might be able to view them in open office. Also keep in mind your organization's plans for software and hardware upgrades and the impact it can have on your digital file. Support for Windows XP ended this year by Microsoft. Here at the Smithsonian, everyone was on the network was pushed onto Windows 7, meaning we had to take some machines off network in order to still work with some of our software and hardware for processing of older files. With audio, our preservation format is broadcast wave, wave or AIFF, all of which are uncompressed. This applies to cassette tapes and DATS, digital audio tapes. Audio reels typically have to be sent to a vendor for that conversion work. Broadcast waves are essentially wave files that allow for embedded metadata within extension chunks in the file. With the audio, our preference is uncompressed PCM, pulse coded modulation, broadcast wave, with a minimum bit depth of 16 and a sample frequency of 44.1 kilohertz. 24 bit and 96 kilohertz kilohertz are considered optimal. We have the lower threshold because of how digital audio tapes are recorded and nothing is gained by pushing them higher. If you are working with a CD with audio files that you have rights to, be sure to rip the files as wave or AIFF depending on the software you use. For instance, if you were just to copy them, you'll end up with CDA files from a PC and these are just pointer files and you actually will not have any audio. Other audio formats from digital media include MPEG-3, Slack, M4P, OGG and WMA. Access files typically are MP3 as these are the most common format. BWF meta edit is an open source tool that creates broadcast wave files by allowing you to enter the required metadata. It was developed by FADGI or the Federal Agency Digitization Guidelines Initiative. It's very useful in that it can validate entries and do batch editing. See, you've got your file name, description, software that was used for the reformatting as well as what agency has the file. And commercial software also is available to work on BWF files. Foreign digital video formats are a challenge. Again, Library of Congress and other federal agencies including the Smithsonian have been tackling this and will be issuing some guidelines in 2015 for both creators and archivists and others who are working with these files. And again, this is part of FADGI work. One example I can present is our authored video DVDs. And these are the DVDs that have menu functionality when you play them. What we do here is create an ISO disc image which is the bit for bit and it allows the menu functionality to work dependent on the software that we're using but it's not reliant on the DVD itself. And we also create an MPEG-2 access copy. Other video we have at this time, we check for playback and keep in mind that one player might work while another does not. VLC, which is an open source free tool is known for being able to play back more formats than other software. And there are many types of video that you may encounter including AVI, QuickTime, OGG, Flash Video, and M4B. Media Info is a useful tool that can give you technical metadata about a video file you're dealing with in terms of video and audio codecs, bit rate, aspect ratio, and more. And if you are looking at converting or re-encoding video files, FF MPEG is another powerful free tool that has a lot of flexibility and it works best through the command line interface. Analog video is much easier for us to deal with here at the Smithsonian. We have two SAMA solos made by FrontPortch Digital, which we use for the reformatting work. The machines are mounted in a rack along with five tape decks, that's VHS, Betamax, Betacam, SP, DigiBeta, and Umatic. We have a lot of Umatic tapes here. The process of migrating a tape starts with entering metadata into the SAMA prep, which is a laptop that shares a database with the two solos. And once the metadata is entered, the prep prints out a set of labels containing a QR code and the tape name. Since the solos and the prep all share a database, a video can be migrated on either machine. And once the migration has started, the SAMAs digitize video in real time. And the result is four different video files. Two that we use for preservation, Motion JPEG 2000 and an MXF wrapper, and an MPEG 2 and a 4.2.2 format. And then two as we use as reference access copies, Windows Media Video and an MLV QuickTime file. And in addition to the video files, the SAMA also outputs two documentation files. The first is a PDF that shows a graphical representation of the signals transmitted during the migration, including color and audio levels for each channel. And the second supplementary file is an XML file that not only provides any metadata entered into the SAMA, the beginning of the migration, but it also shows a frame-by-frame breakdown of the MXF file. If you are working with video creators, encourage them to create the highest quality possible. Uncompressed or lossless is desirable even though it does mean huge files. But files that are too compressed are very difficult to preserve. And we have a quick note about file names. When you're digitizing, you can apply a standardized naming convention. But when you're working with born digital materials, you need to decide if you're going to retain the original file names or change them with a note or log file indicating that original name. And now some other formats that you may encounter or already have. CAD or computer assistance design is very tricky. It gets complicated due to the various levels and layers and functionality within the document. What is trying to be preserved? Who will be using it? There's a potential for so many different audiences with these files. We tried PDFe with mixed results a few years ago, and the format really has not seemed to have caught on. We're now going with PDF as well as keeping the files in DWG format and relying on viewers that are now available that were not a few years ago. One standard in this field is STEP, the standard for the exchange of product model data. And if you're looking for more research about this topic, MIT did a two-year project called FASAD, and the Art Institute of Chicago's Department of Architecture also did some research on this topic. Email accounts are important, and I really feel that email remains a relevant communication tool within our organization. We conducted a three-year email preservation project with the Rockefeller Archive Center called SERP, and we were able to create a parsing tool and co-developed an XML preservation schema with the North Carolina State Archives that we still use today, and it was developed with small to mid-sized organizations in mind. We took the account approach while other email preservation projects have focused on individual messages. And email is complicated because it is proprietary, contains lots of attachments in different native formats, and complex relationships within the account. The point here is not to rely on the original email format alone. In addition to the XML format, some organizations consider email converted to MBOX, which is plain text with encoded attachments to be their archival version. I suggest not relying on PDF versions of email messages because it does not capture header information. Also, keep an eye on the E-PAD project by Stanford. They're developing a tool that helps with appraisal and reading of email accounts, and we're one of the partners who is actually helping them test out the tool currently. Databases, again, not an easy task, especially when it comes to some older ones from the 1980s and 90s. Sometimes all you can do with these is bit-level preservation, meaning you take it in and don't do anything with it at the time. The Swiss Archives, though, has developed a tool called FIARD, the Archiving Solution for Relational Databases, that works with access and SQL databases and creates XML output called FIARD format. And XML, again, is good because it's human and computer-readable, has a lot of flexibility. Just wanna quickly mention web archiving since this affects all our organizations. If you are archiving your web presence, that is great. Many organizations either use a service for crawling or do it in-house with a tool like Heratrix. There are also some small-scale tools that you can install yourself like War Create and Whale. This is developed by a student at Old Dominion. And I feel it's important that we capture our websites and social media because there's history there that's not available anywhere else. Other methods include getting the website files from the content management system or using HTTrack, which pulls all the files in their native format or even just doing screenshot captures. Social media is harder because of how they're constructed and how often changes are made on the back end by the social media network. Twitter actually announced this week that you can search all of Twitter now, which sounds like a good thing to me. Also, if you're working with individuals, you can actually ask them to create archives of both their Twitter accounts and their Facebook accounts. Those functionalities are available within those. I have mentioned command-lined a few times today. If this is something you're not familiar with, I say go out there and learn it. It can be a very helpful tool, but it's also scary and you don't want to delete something by mistake. And software programs like FFNPEG really work best with it. But you have so much capability with it, you can get a directory file listing and then output that information into a text file. You can use it in a spreadsheet for review. Other outputs can include the owner of the file. It's size when it was last accessed. And there are some good tutorials available online to really learn how to dig into this. And of course, we can't forget about metadata and processing and accession notes. Document what kind of migration has been done, what else has been happening with the accession or files. There are lots of standards out there and they can be overwhelming. Use what works best for your organization and be consistent. Take a look at Dublin Core and Met and Premise. Descriptive metadata can tell you what the object is, what's its name and so on. Technical metadata gets into the information about the software and format of the digital asset. Administrative metadata includes the rights and acquisition information. And tools like JOVE, Android, and Media Info can help with extracting some of the metadata for you. The Dublin Core metadata element set consists of 15 elements which address the most basic descriptive, administrative, and technical metadata required to identify digital resource. These include creator, contributor, publisher, title, date, language, format, subject, description, and rights. METs or the metadata encoding and transmission standard contains descriptive, administrative, and structural metadata regarding objects within a digital library in an XML document. And Dublin Core can be used inside of METs as seen here. And then there's Premise. Preservation metadata, implementation strategies. The data dictionary is the core set of preservation metadata elements. You can record the media type, age of the files, hash, and so on. And all of these metadata outputs can be part of your AIT or archival information package for the object or object. So once you have the preservation file, you'll need to consider what the access copy will be. Will it be a copy of the preservation file? Or will it be something smaller like an mp3? Do you have a finding aid on your website for the public to help with discovery? Will you be putting these files online on your website or blog or social media? Or do researchers need to come to you or will you provide access in another way? And do you wanna put the files on Flickr, HistoryPin, or some other third-party site? We actually have had some success in having images identified by the public on Flickr. People have actually identified family members and have contacted us about it. And crowdsourcing is another way to get your materials out there and get some help from the public with materials that have had minimal processing. The New York Public Library was very successful with its historical restaurant menus project. And the New York Times launched a project in October with its digitized print ads from the 1960s and have asked the public to transcribe them. The Smithsonian launched its own transcription center that has materials and manuscripts from across the institution. And the public is able to go to the site and do the transcription while another person does the review. So we are starting to wrap up and I wanted to have some final notes on some resources. If you're looking for some good basic documents on born-digital materials, check out OCLC's White Papers. Gets into the basics of focus on what you can do now as well as transferring materials in-house and working on agreements for when you do some materials out and if you can't figure something out, hold on to it because you might be able to eventually and don't reinvent the wheel. Many cultural heritage institutions and other organizations are facing the same challenges and working on solutions. The Digital Q&A website at q&a.digipres.org was launched earlier this year by the National Visual Stewardship Alliance and OpenPlanet and questions include everything from a multi-page TIF as a preservation format to check some monitoring tools to hardware issues. So here are some references to some of the projects and links mentioned today and there are many more out there. There is no one-size-fits-all solution for our organization and these links and some others are also available on the Connecting to Collection site but I just want to review them quickly. So again, T-SAP will be a web application designed to address the evaluation and prioritization of preservation needs among collection materials that will be launched next year. The format ID site is now available. If you are interested in learning more about Digital Forensics, check out the Bit Curator Project. It is developing tools that deal with disk images and metadata. If you have the need to retrieve hidden or deleted files, this is worth looking into. They also have information about building a digital workstation known as a FRED or Forensic Recovery of Evidence Device. Again, the Bagot tool is a great thing to have in your toolkit. Fixity was developed by AV Preserve and they also have some other tools on their website for metadata and digitized AV work. And then there's also a report from NDSA about Fixity. The file format tools, Jove and Joy have both been around for a long time and have very strong user communities. PDF has become an important format for our types of organizations. Again, NDSA has issued a report this year about the concerns of using PDF A3 in archival institutions. If you have uncompressed WAV files, the BWF meta-edit tool allows you to add the metadata to create the broadcast WAV file. Quick side note here, PDF A files and BWF files actually retain those extensions of PDF and DOT WAV. So you cannot rely on extensions when you're working with those types of files. There's no such thing as a DOT BWF extension. FFMPEG is used across many communities dealing with AV materials. They have a very active group of developers and mailing lists. And the website was recently redesigned to be more user-friendly. Media Info works both from Command Line and GUI and can export audio and video metadata into text, HTML, and CSV. If you're dealing with CAD and digital design and architectural materials, MIT's and Art Institute of Chicago's reports really dig into this topic. Again, SERP, the Collaborative Electronic Records Project dealt with long-term preservation of email and our website has our findings, reports, and links to the schema and the parsing tool. E-PAD or Email Process Appraise Discover Deliver works with unbox files or Gmail accounts. And it can allow the account holders and spells to decide what they want to transfer to the organization. And it'll also have the capability to flag emails that need to be restricted. AV Preserve has a good basic document on how to get started with Command Line on the PC or Mac. And there's also a video from the Open Planets Foundation on Command Line. I didn't get into this today, but there's a very interesting project and paper worth checking out called Good Enough, Digital Preservation Solutions for Under-Resource Cultural Heritage Institutions from Power, which is the preserving digital object with restricted resources. They looked at a variety of tools and options, including Archivematica for ingest and processing and Curator's Workbench for workflow and metadata and tested them out. And this was an IMLS funded project that's wrapping up, I believe, next month. And they also released version two of their tools grid, which lists many digital preservation resources that have been brought up today. And then the DigiPres Common site is working to provide a gateway to tools and resources for digital preservation. So a story I want to leave you with is regarding the Smithsonian's very first homepage. It's taken me years to find this homepage because no one at the Smithsonian has come forward yet with the files from that first website. I'm hopeful they're out there somewhere, but I don't have them yet. But this was actually crawled by the Portuguese Archive in 1996. So the lesson here is to keep pursuing multiple avenues because you never know what you might come up with. And that is it from me and I'm happy to take some questions. Great, thank you so much, Linda. This is Jessica again. I'm just gonna jump in quickly. We do have about 10 more minutes left for questions and we've been collecting the ones that have been coming up so far. So be sure to go ahead and type any remaining questions in the chat box. Before our time is up, I wanted to just go ahead and post a link to a short survey. If you could, please take some time to fill this out. We look at your responses very carefully and they help us to shape future events. So please go ahead, like I said, keep your questions coming. And Linda, I'll go ahead and direct you to our side of the presenter panel for the first question regarding the use of droid. Okay, so I see a question, will droid identify both Mac and PC files? Yes. Great, and then there was a second question, I believe, about whether programs like droid, if there's ones in particular that are better suited to Macs, or if they all work across platforms equally? It varies. I work mostly with PC here at the Smithsonian, but in terms of jove and droid, you should be able to work on the Mac with those. Wonderful. And then, I'm having difficulty seeing the next question, Linda, if you can go ahead and field that one. Okay. So I see a question, is conversion to PDF or PDF-A typically only recommended for access copies? So should you keep documents in their original format when possible? So we actually consider PDF-A and PDF-2B our preservation format when we're dealing with word processing files and such, but they can also serve as access copies, obviously making a copy of the PDF to use that way. And I do urge you to keep your files in their original format when possible. You know, due to that example I presented about the Kodak PCD files that we were able to convert into better versions later on. Great. And then we had a question from Alex, saying that it's been scanning at 3,000 pixels wide. Is that high enough of quality? So that depends on, you know, what your needs are at your organization. If you feel like you're getting good quality out of those images, then you could stick with it. But the standards lately across other organizations has been to bump it up to the 6,000. Great. Thank you. And then we had another question. You had mentioned that doing a PDF of emails doesn't capture the headers of those messages. Is there any other content besides that that is lost? Well, the other issue when you convert to PDF, you're also losing the relationship that you have within an email account itself. Because then you're going to be getting all these single PDF files for all these individual messages. And the email account preservation project we did actually keeps all of that information in one XML file. Great. We just had another new question come in. If you have a digital copy of a document, as well as a paper copy of that same document, do you suggest keeping both copies or do you end up de-assessioning one of them? We typically will keep both copies. With some of the older material, there are cases where we actually can no longer access the digital copy. And so all we have is the paper. Great. Thank you. And then we had a question that was regarding Photoshop. What version of Photoshop do you recommend if you're attempting to add information to a picture? I'm not sure. Exactly. So I'm assuming that's referring to embedding metadata. So there's a product called Bridge that works with Photoshop and it's actually part of Adobe. And that has batch functionality to actually add the metadata as well. But there are other software products out there that will work with images also. I see a question asking about whether we'll be posting the slides from today's presentation. Yes, this recorded presentation will be available on the C2C webpage within 24 hours of today's conversation. And as I mentioned in the chat window, be sure to check out the C2C page. Yvonne has been very generous in sharing great resources with the group today. So be sure to check those links out and you can explore more of some of the resources she discussed in the presentation. Okay, and Chris from Toronto is wondering, is it okay to change the embedded metadata in preservation masters? Well, does that mean the preservation master has already been created? So it can be a little difficult to do that and it depends on what you're trying to change. And if you are changing it, I would document what changes that you did make. And I saw this question come up and a few folks were generous enough to provide some links to some resources that could be useful. But Margaret from Long Beach was wondering what you used to mark your DVDs other than cases. Folks had shared some of the links to pens that they're using for marking those. But Linda, if you had any tips on that? Good roll of thumb with CDs and DVDs. If you are going to mark them, you want to do it on the inner hub of the media. And we don't do that here. We actually do put our CDs and DVDs into Tyvek enclosures. And if we're going to do any marking, we'll do it on that. Great, that's a good roll of thumb. Eliza is wondering if Smithsonian... she's wondering, does Smithsonian capture to preservation masters for other organizations? Here in the archives we're the institutional repository for the Smithsonian. And essentially we are in charge of documenting the history of the institution. We do work with some outside associations that are closely aligned with our mission. So some various professional organizations, and we'll take in their records as well. And occasionally we will also crawl their websites. Great, and I see one final question from earlier. Do you use Bagot for authored DVD disc images? For the authored DVD disc images, we use a tool that's called a ripstation that helps with the batch ingest of our CDs and DVDs, and it has the capability of creating the ISO disc image. And then when we get ready to transfer that file to our backed up server, we'll use Bagot to do that. Bagot essentially is a transferring tool and actually does not create a disc image itself. Great, well I see we're reaching the top of the hour. I want to thank everyone for the wonderful questions and I especially want to thank Linda for your time today and for sharing your wealth of knowledge. As I mentioned before, a recording of this webinar and the related resources will be available shortly in the community. In the meantime, I hope you all have a pleasant afternoon and thank you again for joining us today.