 Hello everyone. My name is Joy Banks and I am the project coordinator for the Clear Strategies for Advancing Hidden Collections six-part webinar series. Welcome to our fourth webinar, Collection Access, describing, cataloging, and processing with the future in mind. This series is offered through the generous support of the Andrew W. Mellon Foundation. It is my pleasure to introduce our speaker for today, Beth Kazook. Beth trained as a photographic preservation specialist at Ryerson University George Eastman House Museum of Photography and has worked as the curatorial specialist for Ryerson University Library Special Collections and as the photo archivist for the Stratford Festival of Canada. She has been involved in a number of digitization and cataloging projects spanning different collections and descriptive standards and teaches courses on the digitization and description of digital image collections for Library Juice Academy. She is interested in how very descriptive practices shape the interactions that researchers and community users have with photographic collections. She is currently working toward her PhD at Queens University where her research focuses on the introduction of photographic illustration into Canadian book publishing in the mid-19th century and the complex relationship that developed between early photography and 19th century print culture. Please welcome Beth. Thanks very much Joy and I just want to say welcome to everyone to the fourth webinar in this series. By now you've covered a lot of ground in these webinars. You've talked about planning and managing the scope of your projects, how to budget successfully and acquire and manage the people needed to carry out your goals whether that's paid or volunteer. Now we find ourselves at the stage where we need to talk about how you are going to do the work of processing and describing your resources so that you can share information about your collections. We'll be discussing good practices for structuring form and content of your records, mainly to optimize them for sharing on the web, but also to give your work longevity. We do have a future focus in this webinar. Employing descriptive standards allows us to create robust catalog records and descriptions that will be usable and understandable for a long time to come. I should mention at this point that like you've heard with the other webinars in this series, the topic is a lot bigger than we have time for here, so if there's anything that I don't explain in a lot of detail, please use the Q&A forum to ask questions that we can talk about at the end of the webinar, and please make sure to have a look at the resource library after the webinar for a lot of detailed instructional material on specific standards and tools that I will be mentioning. You can see that I've broken this class down into four main topics of conversation. I want to start with some broad thinking on cataloging and description to try and find some common ground to launch from. I'm sure that there are many different levels of expertise and experience in the room, and I think that we probably all have something to say on the topic of cataloging. Next, we will discuss some practical information about established standards used in cataloging and describing resources across the glam. Third, we'll talk about how to deal with those lingering old catalog records in order to bring them up to these standards, and we'll finish by talking about how to share those records so that we achieve the visibility we want for our hidden collection, and I'm primarily talking about visibility on the web. So I'm going to start with a discussion. I'd like to begin by talking about why we do what we do. What is it we are trying to accomplish by creating records about our collection? I should note that although I used the term cataloging, I really... That's just sort of a tendency of my own based on where I come from. I mean this to encompass descriptive practices across glam organizations. So if it's easier for you to think what is describing, then go for that, and our time is up. Thank you all very much. That was a very interesting first discussion, and I hope that what we can take away from that discussion is that cataloging is a lot more than just data entry. I think that actually came up in one of the first webinars. How do you convince people that it's more than data entry? Cataloging and descriptive practices generate the necessary information to, and I'm quoting here from this big blurb, identify, authenticate, describe, locate, and manage resources in a precise and consistent way that meets business, accountability, and archival requirements. Cataloging gives us the information that we need to be good stewards of our collection and to connect our communities to those collections. Now for those of you who are not familiar with the term metadata, because I've just sort of jumped over into that, this refers to all the data that we generate in the course of our work. It is often referred to as data about data, or as we see it defined here by the international standards organization, data described in the context, content, and structure of records, and their management through time. So building from that definition, we can extrapolate some of the major benefits of cataloging. Firstly, it describes our collections to our end users, and that was something that quite a few of you said it was a means of access. And it is an access tool. It's often an interpretive tool too. Increasingly, online metadata is replacing in-person interactions with our collections. Secondly, it supports our daily activities, so it is also an administrative tool, not just description and access. It helps us to organize our materials, monitor them, and generally take better care of them. The more information we have about our collections, the more easily we can identify issues, see conservation treatments, or adapt procedures. We are not truly accountable for our collections if we don't keep records. And lastly, it connects our content to that of other institutions. It makes us relevant. It establishes where our usefulness in our authority lies, and thus also provides us with a means of reaching and impressing donors. We don't have information about our collections. How do we explain to anybody what we're doing? And leaving you with that particular thought, I'm actually going to jump you right back into another discussion room. I'd like to take that little break to talk about why we don't always reap these benefits. So what do we perceive to be some of the barriers to cataloging? What are some concerns you have when cataloging? And why is it something that doesn't always get done? Let's put on the back burner. Okay, so it appears we have some valid concerns to work through. Now, I tried to sort of anticipate what most of you said we're going to say in the forum. And I'm thinking that actually most of what I saw can be summed up with these three barriers. So if you decide to review this webinar later on, be sure to go back to the notes and make sure that I've captured everything here. But I think they mostly fall under these categories. We've got lack of trained or available staff to create records. And additionally, not enough time or resources available for remedying this problem. Another barrier, poor access to technology or technological support or our technology has become redundant is what some of you have been saying. We feel like we don't have the right equipment to do the job or we're afraid that technological investments will not have longevity. And I think, yeah, we've all been burned in the past over some pieces of software or hardware that was discontinued or we couldn't upgrade it. And so we might be a bit wary of technology. And lastly, I think, I thought this would be the biggest one for the group, just given the theme of our webinar series. It's very easy to feel invisible when you don't have a network of colleagues and collaborators to bounce ideas off of. And when we feel isolated, we get scared that we're not doing things right or that we're falling behind the trends and that can be a very powerful inhibitor. It's understandable to be worried about these things, but as intimidating as these barriers might seem, I want to assure you that they're not insurmountable. And there are some specific things we can do to break these barriers down. And the first one is to meet staff at their skill levels. I think that's a common misconception that quality records are highly detailed and complex. And for me, good metadata is above all consistent and reliable and not necessarily complicated. I hope last week's webinar gave you some good ideas about how you can make the most of your people. And I'm just going to add my two cents about effectively using volunteers to catalog. If you have volunteers or staff without a lot of training and they don't have the expertise they need for particular materials, you can focus on training them to fill out a few fields or to fill out some part of the catalog record, at least filling that part out well. Have examples available and give them the time and reassurance needed to boost their confidence. Allow them to do research and allow them to learn about the topic because it will benefit you to let them improve their skills. If you want good catalog records, let them think about what they're doing. If staff time is a real issue and not necessarily expertise, you might focus on crowdsourcing and use your staff to check and correct records rather than to create them all from scratch. The next thing you can do to break down barriers is seek training opportunities and budget for them. If you really need your staff to jump up to another skill level and there isn't enough money or additional support or training so that you can get your catalog records to the desired level of completeness, can you budget for it for the future? Whether by looking for grants or reallocating funding from donations or fees, et cetera, any of the sources discussed in the second webinar. The next one is a big one for me. Start with the technology that you have. Don't worry about the technology that you don't have. Do you have access to Microsoft Excel? Can you create tabs or limited files in a text editor? It is not as crucial as we might imagine as it is to have the latest technology. Part of the reason why we are all trying to follow best practices and use metadata standards is so that we can manipulate, transform and reuse our metadata in the future. Whatever technology you use now to create your catalog records will not be the technology that it lives in forever. Keep in mind, your database programs going forward should allow you to export anything you've entered and your community partners, if you're part of a collegial, consortial database, should also allow you to access and use the metadata you've created for collegial projects so you can get that back out. As Angela mentioned in the very first webinar, try to use programs that support data export in open formats like CSD or XML, or use the actual programs like Excel that create them in this format in the first place. I want you to remember that records and databases grow old and outdated and obsolete, but good metadata is ageless. And I want to note just sort of as an aside that I was in this boat where I had several years of acquisitions and descriptive records managed in catalogs and Excel spreadsheets, and I was able to import them easily into our new database when we finally bought one, and that's because I created good catalog records with spreadsheets. Now, my next point, seek community partnerships and be willing to join projects. It's also kind of a big deal for breaking down barriers, and I'm hoping that you all recognize that you've kind of started to create community partnerships just by being a part of these webinars. Don't be afraid to try new things. Sometimes we shy away from opportunities because we are afraid that we don't have the expertise or that the project isn't going to be a good long-term investment, and that can prevent us from building helpful relationships and generating interesting metadata that we can then use for other things. Because remember, we want to be able to get our metadata back out of things that we contribute to, which means we have metadata that we can then use. And that brings me to my next point. Use social media and incorporate social tools into your cataloging. Take advantage of the crowd. By allowing others to comment on your records and share your resources through the use of social media, you will help to make your data more visible without requiring a lot of complicated in-house technology. And lastly, we'll talk about this more at the end of the webinar, I promise you. Create linked open data and use controlled vocabularies. Linking is one of the most powerful things that we can do on the web. Linking allows us to pull in information from other systems, and it establishes connections between our metadata and that provided by others. I can't really stress this enough. Lateral connections build authority, disability, and relevance, and they add a richness of information to your records that you may not have had time to create in-house. And you might be surprised to find that you've already kind of been doing this if you're using controlled vocabularies or even just incorporating HTML links into your records. But again, we'll talk more about linked open data towards the end of the webinar. We are not all going to be working at the same level with our cataloging, but we can all do something to break through these barriers. And I hope that by the end of this webinar, you're going to feel confident enough to make some choices and to move forward with those choices. You don't need to possess significant technical skills to create high-quality metadata. But that said, you do need a plan. Identifying the metadata that you need to capture is not really as simple as looking into existing cataloging standards and just filling out all the fields. Institutional cataloging or metadata plans must encompass all the information that you want to share as well as the information you need to effectively manage your collection. As such, the guidelines that are created to govern cataloging practices are living documents that respond to an organization changing needs, competencies, and outlook. These plans are enhanced through collaboration, or they can be, both within your organization and with colleagues in the broader information and cultural heritage communities. A good cataloging plan describes how to catalog, giving rules and references for your catalogers to follow. It might be fairly brief, just giving a few unique rules and then referring the cataloger to a published manual like AACR 2 for those of you in libraries or DACs if you're in archives. But it is developed in response to broader ideas. Your cataloging plan should be developed in response to general questions. What do you need to know to manage and share your resources? What do your user communities need to know? And how will you preserve your work? So you need three types of metadata about your collections to manage them and share them effectively. That's going to be descriptive, administrative, and technical or preservation metadata. I'm not going to go into this very much right now because in the next section we're going to cover metadata standards that govern how we record this information. And moving on to the next question, when thinking about how you record your information, consider how your users will understand it. Describe your collection so that non-experts can understand the material quickly and easily. Explain what is important or interesting about your materials if you have that information available. Don't bury extra information in note fields or on another part of the website if it's relevant to the description. Remember that you are acting as an interpreter for the most part, particularly in those circumstances where there's no digital record to accompany the catalog record and they are only looking at your catalog record. You have the item in front of you and your users don't. And then as far as what else your user communities need to know, don't be afraid to imbue your resources with your own brand. Don't take for granted that they will know that the resource belongs to you or that they will know whether you have a copy of something or hold the original. You need to be clear about your ownership of the materials. And copyright information is a huge part of that. It's one of the most important pieces of information that your users will look for on the web and it is surprising how often this is overlooked or buried in websites. If you are digitizing content that is in the public domain and you want people to use it, just make sure that they know that. It's not fine for them to use, make sure that they know that too. Don't hide this information somewhere obscure. And lastly, we need to address how you will preserve your work. Like the real estate mantra, location, location, location. You should by now be getting a sense of the glam mantra document, document, document. Create guidelines describing in detail how all aspects of cataloging are to be carried out. You should outline all the required fields and content choices who is in charge of adding information, improving records, what resources or controlled vocabularies are necessary to complete the work. Cover just about everything that you can think of that might deal with somebody taking the materials, looking at the materials and putting the materials back. And lastly, use the standards and practices shared by glam communities. Don't reinvent the wheel unless it's absolutely necessary. The steps you will take to address these questions in the final plan might follow a similar sort of three-part structure. Outline how much information is available. Be realistic about what it is possible to know about your material, how much you can rely on your catalogers to know, and reconcile this with what you know you need to manage your collections and your project. Next, identify your user communities and determine their needs. Conduct a needs assessment. Survey your users. Invite comments on your website or use social media. Find out what your users want to see and what they need to know. And then you determine your desired output. And hopefully this involves selecting an established metadata standard that can capture all of this information. And which supports the complexity of relationships demanded by your records. And a good example of this is that archives require a hierarchical arrangement of their records. So picking a metadata standard that doesn't support that wouldn't be terribly useful. So now that I've mentioned metadata standards and a few times in this webinar, I think it's time we took a closer look at them. Metadata standards govern how we organize and present information about collections. And we use these standards primarily because standardized information promotes the exchange of information. This exchange is necessary because descriptive practices in GLAM organizations vary considerably across the field. And even sometimes it varies within individual institutions. Although we might record a lot of similar information like title or date, we put the emphasis on different areas of knowledge. There's an important place on describing the provenance of materials in archives. While in museums there's a lot of weight given to explaining what things are, as well as a need to track their intellectual history within the institution like what exhibitions they've been a part of. There's a much greater emphasis on subjects as an entry point in libraries, which is something we've only recently seen in archives and museums, which I believe has been brought about by the need to provide subject access to digitized image collections. In more recent years we've seen widespread adoption of standards designed to try and formalize the way that we communicate so that we can communicate with each other. And I apologize for those of you who are well-versed in metadata standards, but I'm going to start with a very simple explanation of why metadata standards exist for any of the participants who have never cataloged or who haven't encountered standards before, and I saw from the lobby poll just before we came into this that it looks like maybe somewhere around a third of you are in the boat where you're not using metadata standards or you're not sure. Okay, so this is information about a book in a library that I have communicated to you. Now, although I have succeeded in the task of conveying the correct information about this book, the way I've presented it renders it pretty much useless to most audiences. Unless you're the one who wrote this down, how do you really know what this information describes? A lot of what makes information useful is the way we structure and display it. Here we see the same information represented in a structured format that clearly shows us which pieces of information describe which aspects of the thing that I described. The information has also been formatted so that the words are spelled out. You know, we're getting green instead of gr in the title of the book. And the author's initials have been capitalized so it's clear that those letters represent a proper name. The use of field labels for structure and proper spelling and capitalization for the content has turned that useless string of data into understandable information. The quality of our information also depends to a certain degree on transparency. In a lot of cases, we understand certain things about the objects we have in our hands that would not be clear to someone who is not also able to hold that object in their hand. If I were to add another piece of information to my catalog record here, something that cannot be transcribed from the object itself but results solely from my understanding of the object, you'd know even more about what this collection item is. Sometimes I just wanted to add this because we take the most obvious information for granted and it might be okay if the record remains within our own databases, but when we exchange these records with other systems, that's sort of really straightforward intellectual information. Some of the first stuff that gets lost. Structured, controlled, and well-formed metadata makes people, concepts, and things distinct in our minds. But it does much more than that too. It makes information about our collections useful for computers. And Hollywood would like us to believe that computers are two steps away from taking over the planet, but they're really not that clever when it comes to understanding information. We have to give our databases and systems a lot of rules to yield productive results. And if you think about that first string of information I presented you with and understand that as much as a human reader may have struggled to make sense of it, the computer has no idea what any of that means. You might be able to return records based on keyword searches because the computer can identify a string of numbers and letters to match whatever you typed in. But you need to give your records some context in order for databases to carry out sophisticated processing like sorting and aggregating. Here we see an example of that. Computer can sort by title because you've consistently put the title information in the same field and you ask it to search only that field. Computer can also find relationships between distributed information. It understood that the author name in each of these records was similar to the same person, and so it was able to collect together these records. Keep this in mind for later because the more precise your terminology within a field, the better a computer will be at finding relationships. So in short, the more consistent you are with the way you express something and where you put your information, the easier it is for both human and computer to make sense of what you've catalogued. The need for structure and consistency applies to all types of records you might encounter as a cataloger. Some of you will be working in institutions where cataloging is an entirely separate task from acquisitions and processing, while other institutions may lump this kind of stuff together with cataloging, particularly if you're working in a one-person operation. To recap from what we saw earlier, you need three key types of information to manage your resources, descriptive, administrative, and technical preservation. Descriptive information is meant to be seen by the public while administrative and technical metadata is mostly behind-the-scenes kind of stuff, and I'm actually going to jump to the behind-the-scenes stuff first just to get it out of the way. So here's an example of administrative metadata on the left there. There are many different manuals that have been produced over the years that can help you to decide what administrative information you will require to manage your collection's materials, but it's mostly up to you how to structure this information and where to keep it. This is generally not shared data, so as long as it is standardized internally, it can be used effectively in-house. I have some examples here of manuals that address administrative metadata. The Small Museums Cataloggy Manual by Museums Australia has guidelines on registration, naming conventions for file images. Standards for archival description handbook has a chapter on labeling and filing. These are examples of the kind of information that is generated for local use and should still be standardized, but there's no standards. Technical metadata is relatively new to the cataloging scene. This type of information pertains to digital objects, and it is accorded to ensure that digitized collections and born digital files retain their integrity and usability over time. Born digital refers to those files that were created digitally and have no real-world counterparts, so for example, this PowerPoint presentation is an example of a born digital file. The content on the left of the slide is an example of technical metadata recorded for a digitized photograph. As you can see, most of this information seems pretty straightforward to you. It's the kind of stuff that can be found simply by clicking on the file on your desktop. Technical metadata has a lot of overlap with preservation metadata, which is used to verify provenance and authenticity in digital objects. If you are planning on creating digitized collections, you are going to want to take a closer look at some of the guidelines that have been developed to deal with this information, such as the premise data dictionary, which I've got here, or the ANSI, NISO, technical metadata for still images. The checksum field at the bottom left over there is an example of preservation metadata. A checksum must be generated by a computer program designed to create checksums, and it is used to monitor digital files for fixity, basically to ensure that the files haven't changed. You utilize checksums by generating them periodically over the life of a file. If the number you get from the subsequent checksum doesn't match the original number, that means that the file has changed. It might be something as small as a single pixel that has become corrupted, but that's kind of the whole point, is that the checksum will notice problems before you do and allow you to take action, either by copying the file to another device, migrating it to a new format, et cetera. Now I will warn you that this is an area of cataloging that can get very technical very quickly, primarily when it comes to storing born digital collections, which tend to be our most fragile collections. But before there's any panic in the room, I want to reassure you that no matter your level of expertise, you can record quality preservation metadata about your collections in adherence with best practices without relying too heavily on technical tools. If you are interested in the technical tools, though, I feel like this is a good point to name drop the resource library for programs like Bit Curator and Fixity. The purpose of preservation metadata is simply to answer the following questions that we see on the right there. What is it? Who created it? Where did the information about it come from? What can users do with this information? And how do I know that the information has not been altered? A lot of the technical tools that have been developed around digital preservation are designed to prevent objects from becoming altered, corrupted, or unusable. But that's just like a tiny piece of the preservation pie. You can answer all these questions without resorting to technical tools at all. And in fact, if you have digital objects in your institutions, you might have been recording preservation metadata without knowing that that's what you were doing. Just like we want to know all we can about how an item wound up in our collections, we want to know all we can about how something digital was created. On the left is a brief overview of some of the metadata required by premise published by the Library of Congress. We can see the first couple of fields there, file types, dimensions, and size were also captured by technical metadata. The new stuff following inhibitors, and the inhibitors are if the file is password protected or encrypted for any reason, they want you to describe this in your records. You're basically going to write down anything that might prevent you from opening or using the file. The provenance field, literally where did the digital object come from? Did it come in on a CD? What was the camera make and model that it was downloaded from? If the digital object has moved repositories or has been altered in any way, you need to describe this. Particularly if the copy of the original file has been made smaller, that's a change to the file and that affects its provenance. Significant properties. Are there any features of this digital file that are important to maintain but might not be apparent without opening it? For instance, if it's an animated file, like an animated GIF, you should make a note that you expect the file to, you know, move. It's not just a still image. And lastly, write. This is the standard field in most descriptive records, but premise considers it to be of vital importance to the usability of the digital image. And I'm begging you here, please record clear write statements now and your future selves will thank you. So like with administrative information, it is largely up to you how to record preservation information. The premise data dictionary is actually a set of guidelines, not a metadata standard or a rule book. There are some descriptive structural schemas that have fields for preservation information, but there are others that don't. And so you get to decide where you're going to store this extra information. There tends to be two popular options. One, metadata is stored within the digital file, and two, metadata is stored in the database and linked to the file. And I'm going to look briefly at the first option. When a digital file is created, a lot of technical details are stored in the file automatically anyway, and you can use any image editing program to embed copyright statements and even descriptive information. I'm showing you an example here of an image that I digitized, and I opened it in Adobe Photoshop, and I popped up a file info dialog box in Photoshop. I'm going to get my little pointer here. So we're looking at Adobe Photoshop, and we're looking at how we can embed images into digital files. So showing information in the digital file has some perks. The information travels with it, particularly if it's posted online. And if you don't have a database yet, this can help to manage your digital collection to a certain extent because keywords, and I don't know if, Joy, you can drag the arrow down to the keywords section. There we go. See, I've added some keywords to this file. If you enter these in, you'll be able to search those keywords from your computer desktop file manager program, thus making your digital objects a little easier to manage. And I just want Joy to drag the arrow a little further down to... There we go, the copyright notice. So that's also something that I added into this digital file, which will remain with it. It means whoever opens it will be able to find out where this file came from. I think that's about it. Thank you for the arrow, Joy. Okay, so just going back to the keyword section there, be aware that the more places you put catalog information, the more time you're going to spend cataloging. So if you are duplicating this by cataloging it in a database as well, it's just going to take that much more time to also catalog these files. There are some programs that can allow you to embed metadata across a number of images, but stuff like the keywords is always going to be somewhat unique. I would say that most institutions tend to store technical and preservation metadata elsewhere, or they use a combination of information in a database and some minimal information in the digital file. And that's largely because of the warning I need to put here. If you have a problem with a digital file and you never copy any of this information elsewhere, you're going to lose both your file and the information about it. So duplicating efforts or storing information elsewhere tends to be more common. Okay, so this brings us finally to the kind of metadata that is governed by established standards. Descriptive information. Producing good catalog records requires adherence to both structural standards and content standards. Structural standards, also called encoding standards, tell you what pieces of information you need to record. They determine the fields that you will use in your database and dictate what type of content belongs there. They also define the relationships and complexities of information. In EAD and CDWA, for instance, Standardties in Archives and Museums, people or agents are described separately from their materials. While in Mark, which is far more common in libraries, and I think a lot of you here are using Mark, people or authors are added to a controlled index, but they are not described outside of the catalog record. So the standards have different fields, but they also have different functionality, which will affect how you catalog. Content standards define how you should describe the information in a given field. So the structural standard gave you the field. This is what you're going to put into those fields. For instance, do you transcribe the title with all capitals? Do you square brackets to indicate uncertainty? Do you record the date of month, day, year, or do you write it out in words? Content standards create uniformity in the way that data is expressed, and they tend to be the standards that your catalogers will refer back to when they're developing a descriptive record. Most descriptive content standards were developed in conjunction with structural standards, which means that they pair well together. The structural standard CDWA works really well with CCO. Mark works well with RDA. It is possible to mix and match them, although you will need to watch out for conflicts between what the structure standard defines as the field information and what the descriptive standard suggests that you put in that field, because sometimes there are minor differences. If you use the field for a purpose other than that for which it was designed, you may have problems exchanging data with others, or you may not be able to use some of the interesting tools that were developed to work with these standards. You may even have trouble migrating your data to a new database in the future. So it is generally advisable to use the structural and descriptive standards that were designed to work together. And if you do make any local customizations, write them down so that these customizations are understood later on. Picking one of these pairs of standards is like joining a club. It ties you to a group of like-minded collections managers, and it establishes a style of presentation that builds familiarity among users. The thinking is that someone who has learned how to use one archive will be better equipped to navigate another archive. But you don't have to use the standards or the schema that everyone else is using if another would be better suited to your collection needs. If you work in a museum but you think an archival standard would be good for your collections of manuscripts and documents, you might want to use EAD and DAX. Fit is key here, though. Our end users expect to see materials explained in a certain way because of the way our institutions have positioned themselves. And we don't want to do some things so radically different that our digital collections don't reflect what we do, or regular collections, or adopt a schema in which we have to shoehorn information into fields that, you know, really weren't designed for our purposes. That said, a good metadata standard will allow you a certain level of customization while ensuring that your records can be migrated in exchange with other systems. That's the idea in mind that we all have a lot of unique materials, or we might. The demands of metadata are constantly changing, and whatever standard you pick should be able to adapt. All of the standards that have been mentioned here are syntactically interoperable, which means that they can be exchanged with each other. They're all based on an XML framework, including Mark, which has MarkXML. This means that if you decide one day that you really need to do something completely different, you can move your data from one standard to another. And that is why we feel confident that collections described according to standards will be sustainable, because these standards are not technology-specific, and you can manipulate and exchange them with other standards. And so that is a good point to move on to the topic of legacy data. Everything we've talked about so far in terms of standards and guidelines gives us an idea of where we want to be with our metadata, and I'm not going to interject the future direction, but, you know, what about the past? Many of us are dealing with leftover legacy data that makes us unsure if or how we can use older catalog records that may not have been made according to one of these standard schemas. And on that note, I'm going to move this over into another discussion. Now, I've imported the poll here that we took at... or Louise has done that for me, thank you very much. We've imported the poll that we took from the beginning of class at the lobby, and I can see that 74% of you are dealing with legacy data, so that's a big chunk. What I would like to know now is if you have some concerns about dealing with legacy data. Okay, so it seems like quality is a big issue with these records. Interoperability... I didn't really see anybody saying reconciling different purposes, but I think we had somebody who was concerned that the old metadata didn't look anything like the new metadata that you're generating. Okay, so we can deal with a lot of these concerns, largely deal with these concerns by following three or the four steps outlined here. First, we're going to identify all of the available metadata. Then we're going to build something called a crosswalk or data map, which essentially means that we evaluate field by field what metadata elements from our old catalog may or may not work in the new standard. If you encounter a lot of problems in this process, you can either decide to perhaps try a different standard and then hit more closely aligns with the original format of the metadata, or you can take steps to edit or clean the data from your old catalog so that it fits better. And I expect that some of you have had experience with the cleaning before. There's a high probability that you're going to have to do some minor cleanup no matter how well your crosswalk works, but you can decide how much effort to put into this stage. And it may be that for some of you, the effort just isn't worth it to salvage the data, but I think for a lot of you, you might find that cleaning up the legacy data is actually better than rewriting all the old data from scratch or recreating it. I put the last step in square brackets because not all of you will proceed to step four. Crosswalks have broader applications that can be incredibly useful tools for preparing your records to contribute to union catalogs and share repositories, identify metadata for harvesting or to allow for search and interoperability across systems. I've heard from the feedback on this webinar series that many of you are looking to move your records into a new system, so I will talk about databases briefly in a moment. So step number one, locate all your data. The first step before you can transform your records is to find all available records. And I'd like to emphasize this point because relevant and useful data about our collection can be found in a surprising number of places, particularly if you don't have a standard centralized catalog yet, and I know one of you said something about data is all over the place. It's easy to forget about a write-off inventory list that you did ten years ago that might contain a lot of good collections information. Even a Microsoft Word document can be transformed into a spreadsheet to a CSB file and crosswalked to a database and can therefore provide minimal population of a record that can be enhanced later. I would locate and evaluate everything I can get my hands on before duplicating prior working cataloging because that might be more work to catalog from scratch, believe it or not. Okay, so the next step you're going to create a metadata crosswalk once you've identified the metadata, the collection sources that you're going to use. You're going to begin creating a crosswalk or map that will help you to see how your existing collection data fits or doesn't fit into your new standard. You can start this process with something as simple as a spreadsheet. List all the existing fields in your current database and consider how they have been used. Consult recent and past records to get a sense for whether or not they've been used consistently, and if you have past cataloging manuals that can be hugely useful here too. Then take a look at the published guidelines available for the standard that you want to adopt. Now you're ready to map which field from the old catalog is equal to which field in the new standard. I have an example here showing which fields in the Dublin core standard are equivalent to which fields in the VRA core standard. Things pretty straightforward so far. Once you make your choices, be sure to document your reasons for these choices so that if you run into issues later, you'll be able to remember your reasoning and adjust as necessary. Documentation throughout this process is also incredibly useful to help you explain yourself later. Whether you're explaining to fellow catalogers how to use the new system or explaining to your administration or grant organization how you plan to carry out the task of standardizing and updating your metadata. So if we continue mapping fields from the Dublin core to the VRA core, we are going to eventually encounter some fields that don't map very well. My previous slide was showing you the shiny happy world and this is where the problems start to pop up. Glam organizations tend to use a lot of common pieces of information when describing collections. But they also express that information differently and so the standards are constructed differently. Some fields might map to more than one field while some might have nowhere obvious to map to. Be prepared for the fact that some standards will map more easily than others. Let's take a look at the first row. The contributor field in Dublin core is defined as an entity responsible for making contributions to the resource. Agents in VRA core are people so it would make sense the contributor would map to agent. But the location field in VRA can refer to institutions or corporate entities as well as geographic locations. So therefore it could be that the location is an appropriate field to map to as well. Determining which field is the most appropriate will all depend on how the contributor field is used in Dublin core. If it primarily contains information about the contributing institution it would make sense to ignore agents as a possibility and map it to location only. So I'm going to take you away from the Adobe connect platform for a moment and over to a website containing the getting metadata standards crosswalk and you should all see something pop in your browser. Now I haven't taken over your screen or your computer or anything so you'll need to navigate around this page to see all the columns and all the rows yourself. So just have a quick look at this. If you don't see this table you can try copying the link in my PowerPoint and pasting it directly into your browser manually. This data crosswalk was produced by the J. Paul Getty Museum to address the mapping of the CDWA standard to all the various other community standards used by GLAM both structural and descriptive and we see CCO, CDWA light, CRA core, we'll go across the board, mark, mods, media, DAX, etc. This is a fantastic tool if you are already working with a metadata standard and you want to share your metadata with other institutions or contribute to shared repositories. It might also be helpful to consult if you are using different standards in your institution and you want to figure out if they can be amalgamated into a single database or queried by a single search engine. Finally, it might be useful if your old catalog records are kind of similar to an existing standard and you want to see where there might be potential problems with mapping to a new standard. This is not the only one of these types of crosswalks that exist but it is the most comprehensive in terms of number of standards addressed so I find it quite useful. Now if we can all come back to Adobe Connect and I have to trust that you will come back to Adobe Connect yourself and move on cleaning up our data. So having looked at mapping the structure of our old catalog records onto a new structural standard, we need to figure out if the content is useful the way it was previously recorded. A lot of editing of old catalog records can be automated through simple spreadsheet software. You can remove extra rows and duplicate records to a find and replace on unicode characters that didn't import properly, although with caveat here be very careful with find and replace because it may find and replace things that you were not anticipating. Excel can also split cells with multiple values and we see an example of that here, a couple of rows down in my old record data I had two values separated by a semicolon, which I have now split into two separate rows. Don't expect everything to automate perfectly. Make sure you do spot checks on records before you import or move on to the next step. Even if you are using a more robust data cleaning tool like Open Refine or Data Wrangler and these are tools that are linked in the resource library if you want to learn more about them because they can do a lot more than your spreadsheet can. You'll have to be prepared to deal with data that just has to be cleaned by hand. In the last row we see a poorly composed description that has been greatly improved by rewriting. If you know you're going to have a lot of problems with certain fields from the old catalog, you can choose not to map those fields, but you will lose the information on the records unless you do copy them by hand. Cleanup is not something that should be done on the fly. Before you undertake this process, review the catalog information with your colleagues and staff and clearly outline your expectations. Identify the staff members who will be responsible for certain steps and clearly detail proposed actions and decisions. I'm showing you here an example from one of the many spreadsheets that I used in my own past work in an instance where my department and another department were going to begin sharing a database and we had to come together to talk about different problems that might arise from importing two different sets of metadata, so although we made separate crosswalks, we had to come together over the cleanup. As it turns out, one of the problems that we encountered was that we were using duplicate accession numbers, which could have been a big issue if we hadn't caught it. Most of us will want to move our old data from existing databases into new databases. Many of us here will. So let's spend a moment talking about purchasing a new database. How do you decide on the right system for storing all this information? The first point and probably the most obvious one is that it supports your selected structural or descriptive standard. Not all database vendors are clear about what standard they have based their field design on, if they have based it on anything. Others are more upfront. You might have to test out the software to be sure that it will work for you. Next, provide appropriate fields for administrative and preservation metadata if needed. You might be using a database that you are storing descriptive information separately from administrative and preservation metadata, but if you do need those fields, make sure they have those fields. Because these types of metadata are not governed by standards, there's no easy question to ask to determine if the database will work for you. You just need to ask questions and use the premise or NYSO data dictionary to give you some idea of what fields to ask for, maybe alongside some of the manuals for archival museum library administration. It should support the OAI protocol for metadata harvesting. This is an exchange protocol that has historically allowed you to expose your data to the web, so it's a big one. The next one, you need to make sure that you have the technical support to install and implement the database. Often collections databases require more than just clicking a download icon and waiting for the database to install itself. Server-based databases usually require additional software licenses to mount and run, and they may require an IT staff person to maintain if you are not an IT professional yourself. Some companies offer technical support packages for an additional cost, which you'll want to be able to budget for if you need it. Which brings us to the next point, your database should fit your budget. You've heard this before in the previous webinars so free is not free. Open source software will have costs associated with server maintenance and customization. Vendor software might require costly upgrades. Ask questions to the vendors and developers and other institutions who have used the database. Finally, you need to make sure that your database has exit strategies. Some of you may be dealing with old databases where it's not clear how to get the information out of it. You don't want to be stuck in that position again. If the company that manages the database goes under, what happens to your data? Will you be able to export it out of the database into one of those open file formats that we mentioned before? Something simple like XML or CSV even. If you don't upgrade, will you still receive support? If the data is being stored in the cloud, where exactly are the servers located and what procedures are in place to protect and preserve your data? In terms of actually picking a database, the resource library is a really great resource if you want some specific software recommendations. And that brings us to controlled vocabularies and linked open data. The metadata that we create using structural and descriptive standards promotes interoperability, but connecting distributed pieces of information across the internet requires additional effort. Remember the example from the start of the webinar showing a list of book titles by the same author. Controlled vocabularies and linked open data allow there to define relationships across records that we can pull that information together. Keywords derived from controlled vocabularies and established indexes provide a much higher level of interoperability than standardized descriptive content alone. They help to avoid ambiguity between similar terms. If you say the word iris, are you describing an eye or a flower? They can give an official term for something. Freniculars are described outdoor elevators that operate on inclines and they can describe relationships between concepts. For instance, a sign needs cat is a more specific term than cat and grapes are related to wine making. How do we create and add controlled terms to our records? First and most common we work from an existing vocabulary resource. Certain communities have developed extensive thesauri that serve their specific needs. The art and architecture of the source for instance provides detailed terms for describing artistic concepts, periods, genres, materials, and techniques. Another type of controlled list is available through the Library of Congress which provides an extensive list of published author names and their name authority index in addition to their subject heading. You can access these lists online for free or with a subscription and in some cases you can purchase a database that has these controlled terms pre-loaded for every one. Your descriptive standard may suggest certain vocabularies to use with certain fields or you might have the option of using several. If you are uncertain which vocabularies might be right for you, you can use the websites to clarify their purpose. Bear in mind that they are not all designed to be used for people in subject terms which is I think how a lot of us tend to think of them. They can describe locations, physical description, time periods, etc. You might want to create a local vocabulary list if your collection material is really unique or esoteric, but a good compromise is to use established vocabulary alongside local terms. So you want to clearly identify your local terms as unique to your institution and use them to add breadth to your resource in addition to working from an existing vocabulary. Now there will be circumstances where you do have to create your own terms as you go from guidelines. We know that our descriptive content standards define the types of information that go into certain fields and sometimes they are so specific about the phrasing of that information it effectively creates a local controlled list. In DAX there are rules for composing person access points for records and an access point is an index term or a controlled entry point into the records. Most archival repositories or in some cases museums do not use controlled vocabulary for people because it's just too difficult to anticipate what names will be needed. So each of the published vocabulary resources that we saw listed there have their own quirks and I can't instruct you on how to use all of them at once particularly since I think I might be running a little bit long on this webinar but I can outline some general guidelines on how to use them. First read the scope notes and observe the hierarchies to confirm that each term is the desired one. Sometimes vocabularies are just lists but most have scope notes that describe what terms mean but have some visible hierarchical structures that you can decide that the term you've selected is an accurate choice. Always use the preferred form of the term. Just because you can look something up in a vocabulary doesn't mean that you should use that term that you looked up. There is always a preferred term identified and if you haven't landed on it you can always try. You'll see a note directing you to use the preferred term instead. And lastly use the narrowest term applicable. If the vocabulary is arranged hierarchically with broader and narrower terms just pick the most specific one. And I have an example set up here where I took a screenshot from the Getty's source of geographic names where I'm looking for a town called Brazil in Tennessee. Now I can see from the hierarchy of this record that that is what I have gotten. If I was looking for Brazil in the nation in South America I could see immediately that I'm not in the right spot so I could back out and look for another term. I can also see that Brazil is a more specific term than Tennessee based on that hierarchy so I would pick Brazil. And Brazil is the preferred form of the name if you see just above hierarchical position there's names Brazil and popular Grove is preferred. So I shouldn't use popular Grove even though that might also be a way to refer to this place. Links to open data is sort of the next generation of controlled vocabularies. It is a means of connecting anything to anything else. Concepts, people, places, etc. This is accomplished through something called resource description framework or RDS. RDS creates links between content on the web using uniform resource identifiers and HTML. But unlike playing HTML RDS describes the context of links on the web. And here's one way to imagine how linked open data works if you're unfamiliar with it. Here we have a little statuette of a lion from the Getty Museum and there's a link that directs us to information that pertains to this lion. This is the sort of HTML link we might encounter on any website. We can read the wording of this link and infer what the relationship is but sometimes the links are not so self-explanatory. Without RDS there's always a mystery to the computer too. An HTML link does not establish the relationship between web pages. And that's where RDS comes in. If we imagine that the lion, the place it was created and the concept of place created are all things that we can link to, we have the basic idea behind linked open data. RDS triples define contextual relationships between content on the web and they're the reason why the web is getting smarter. Not everyone is going to be in a position to utilize linked open data right now but that does not mean that you cannot create the metadata that will be useful for these types of projects and more down the road. Controlled vocabularies are at the heart of linked open data and you can start using those much more easily. There are data tools out there that are being developed to process structured and controlled metadata and convert that metadata to RDF which is kind of exciting as well. It means you don't have to write the RDF triples yourself. You can just sort of run it through something like on Tomaton or onto Wiki which we've linked in the resource library. These were developed primarily for research data but there's new tools being developed every day. We have a really quick one-minute activity here and I'm just going to have everyone participate in the classroom chat if they can. Having put such an emphasis on the value of control, I want to switch it up and talk about uncontrolled keywording. So don't spend too much time thinking about your work. Just give me a word that describes something about this picture. Now whatever pops into your mind a picture is worth a thousand words. You don't even have to be right. It might be how you feel about this image. So user-added descriptors like word tags allow our users to tell us what we missed that might be important to them. As I hope you can see from this exercise controlled vocabularies clearly have their place but they have their limitations. They're often not exhausted and despite the use of technical and specific terms are never as narrowly specific as the keywords that might be dreamed up by a whole crowd of people. We might consider it the thought of purposely allowing user typos or inaccuracies into our descriptive records but consider that not everyone will know how to articulate what they're looking for particularly if they do not understand the catalog information the same way the cataloger does. Chances are your users have some words to express what they mean that are different things. In the previous webinar Sarah mentioned that crowd sourcing projects are a great way to engage a lot of volunteers and tagging collections is an excellent simple task to give this kind of diverse group. I'm not going to go through these in too much detail largely because I think I'm running later than I should be at this point in the webinar and I want to make sure I leave time for questions. So I'm just going to say you can purchase a database. A lot of databases come with social media tools built in now and it might just be a matter of turning them on. You can download the code for buttons on Pinterest or delicious and just see what happens. You can put an email address at the bottom of your record and invite questions about your catalog record. It's the easiest solution but it may require a bit of work because you need somebody on the other end of that email address. You can also put your content on third-party website that supports comments and tagging. A surprising number of well-respected institutions are using a variety of social media tools to reach audiences and solicit feedback. We see the Smithsonian here the Model Lab, Library of Congress, a lot of smaller institutions are using these too. These are more than just tools for announcing events and attracting visitors. They can also be used to understand how your users are engaging with the catalog information that you put out there and where there might be gaps in that information. You do have to bear in mind that social media is most useful when people are paying attention to the social media accounts. I've got a case in point here. I've compiled a couple of screenshots of an uploaded picture on the Smithsonian's Flickr page. There are a number of tags here in the middle that complement the Smithsonian's brief record and make this image discoverable on Flickr. But not all the tags are accurate and that is a risk you take by opening it up to the crowd. However, in prepping this webinar it was pointed out to me that the catalog information is actually not accurate in this record. So in this case, we can see how social media could be used as a great feedback mechanism for attracting expert opinions. And I'm looking down at the chat box now for Louise's comment who told me all about why this is probably not an example of ICAT weaving, which is in the Smithsonian description of the item. Based on this feedback, the catalog could enhance this entry and provide the end user with the information needed to understand, you know, which of the tags are correct. What does the future hold? Everyone here is at different stages and levels of expertise in their cataloging, but hopefully you are all beginning to see answers forming to those planning questions I posed at the start of the webinar. Whether you are only at the stage where you can create spreadsheets, you can begin to use standards and controlled vocabularies that will allow you to easily upload, use, and share that information later. If you have great records but aren't sure how to begin linking them, you can use some of the RDF conversion tools and social media tools that will allow you to reach out broadly. And I'm just leaving you here with a few points that I gleaned from an article in the journal First Monday called Moving Towards Shareable Metadata. I thought this summarized a lot of what I hope you will take away from this webinar. Catalogging is most effective when your cataloging is consistent and coherent. When you take those extra steps to provide context, you reach out broadly to communicate your records and most of all, looking at the top and bottom points here, the content is optimized to the best of your ability and your records conform to standards. I just want to thank you guys for this opportunity. Thank this class for humoring me. I enjoyed hearing what you had to say. Certainly if you have any more questions because particularly I'm cutting you off, feel free to email me and I'll just make my email pop up here. But also use future sessions to talk amongst yourselves because clearly there's lots of different levels of expertise here and there may be answers from the community.