 Welcome to this project briefing on bringing linked data into libraries via wiki data, the PCC wiki data pilot project at Texas State University. I am Mary Acoff, the Database and Metadata Management Librarian, and I am joined by Nicole Critchley, Assistant Archivist, University Archives. Amanda Scott, Whitleth Collections Cataloging Assistant. Just a quick outline, we will be discussing the problem of hidden metadata, an introduction to the various projects, their methodologies, impacts, and concluding with challenges and a summary. Years ago, there was much focus in librarianship on uncovering hidden collections, but discovery of metadata hidden in silos is also a significant issue. Considering how much time and limited financial resources libraries have invested to describe their collections, this includes legacy schemas and formats, such as your traditional mark format records, PDF finding aid, and especially authority records. These silos often require extensive training to use and have limited use and visibility. Pictured below is a screenshot of one of the authority records for our faculty. Such hidden data cannot be utilized to its full potential. Over the past decade, linked data has been proposed as a solution to provide context and connections in our metadata, but has proven elusive for most institutions to implement. Enter WikiData, which has caught the attention of many GLAM institutions as a relatively low entry barrier way to work with linked data. Pictured to the right is a screenshot of the first few properties for standard scenarios. The graphical interface enables editors to work without requiring knowledge of linked data concepts such as triples or RDF. It was created in 2012 as an initiative of the Wikimedia Foundation and runs on Wikipace, which is the same platform run by Wikipedia. Both the GLAM world and Wikidata believe in the importance of knowledge creation and creation. You can query using Sparkle and there are quite a few user-friendly ways to learn and perform queries. Last but not least, it is not a matter of building something and hoping that the search engines will find it. Wikidata is already used by Google and other agents such as Siri. In the summer of 2020, the program for cooperative cataloging issued calls for any GLAM institution, whether they were a PCC member or not, to participate in a Wikidata pilot project of their choosing. We committed to this pilot for all the reasons listed here. It provided both a learning experience and an avenue of participation in linked data that generated clear results in terms of increased visibility. Next, I will introduce the three pilot projects. All three projects, despite their diverse focus, share a key purpose of increasing visibility. First, included creating Wikidata items for faculty at our institution. The second one was an oral history collection and university archives. Finally, we strode to increase visibility of our special Whitleth collections by linking to PDF finding aids. In terms of the faculty project, we were very much interested in adopting an approach that's been getting traction lately. Identity management, as you might guess by its name, includes using identifiers instead of text strings to distinguish authors. Using the linked data parlance, linking to themes as opposed to strings is preferable. Second, linking to data outside library silos will facilitate the connections which linked data depends on so critically. Finally, identity management in the linked data environment empowers institutions, such as university libraries, in creating items for their own communities. In many ways, we are the most motivated and often best positioned to create such items. Speaking of identifiers, Wikidata is increasingly being known as a hub of identifiers. Here's a screenshot of just some of the identifiers found on one faculty record. Next, Nicole will speak about her project. For the university archives project, the goal is to bring attention to the oral histories and the rich demographical data that exists for these participants. The archives has bits and pieces of demographic data in several places, spreadsheets, not publicly available, the website, the collection, and the PDF finding aid, which is not easily searched. Wikidata was searched by more people and we wanted to go where they already were, and also help us connect our oral histories to other archival collections, either at Texas State or other institutions. The Whitleth collections project focus is primarily adding links to finding aids, creating a Wikidata item where none currently exist. The intent of the project is to generate more exposure for finding aids, associated collections, and the Whitleth collections as a whole. Raising awareness for Whitleth archival holdings and associated entities were ultimately creating a single base for all publicly available biographical data. And now I'll hand it over to Nicole to talk about methodologies. How we started. Our methods varied, but we shared some commonalities with each project. We worked independently according to expertise and we all researched and compiled data from multiple sources and created spreadsheets for our data. We also made data models specific to each project. The PCC pilot encouraged all participants to set up a Wikis project page, which proved extremely helpful for documentation purposes. For example, here is the data model on our Wikiproducts page for the oral history project with properties we wanted to make sure we included as well as users notes. The more data you add, the more likely it will be included in a query. Wikidata has a few properties of notes that are useful for special collections and archives. When we first started this project, there were two relevant properties has work in collection and archives at and through collaboration with the PCC pilot project. They proposed a new property in Wikidata oral history at which was approved with community support last spring. The Whitleth collections any of your archives use a combination of these properties. Archives at if it had a full finding aid at Texas State or another institution or has works and now oral histories at for oral histories and interviews within collections. Once we settled on a data model unique to each project, we started edit the items, either manually or using tools such as Open Refine or Quick Statements to both create or edit Wikidata items. The goals for each project are listed here. The Whitleth collections archives are approached as individuals, meaning 152 archives in the Southwestern writers collections must be analyzed separately to determine need for entry. Only seven of those 152 remain to be analyzed and once complete the project will then shift to the photographer portion of the Whitleth. Each entry will include a link to both the Whitleth collections and the university library's repository in an attempt to prevent loss of access during a time when finding aids are migrating to archive space. However, because archives are constantly being added to the collections and need for monitoring both sites for new holdings and newly added biographical data remains. Now Mary will talk about the impacts. So what impacts do we observe? For the faculty project we exceeded our goal of creating or enhancing 100 faculty items thanks to a number of staff including our music cataloger and we're still going strong. Pictured is a snippet of a query transformed to graph form. In addition, we also link faculty to any scholarly articles found on Wikidata through a tool called Author Disambiguator. Pictured here is a screenshot from yet another tool called Scolia, which provides an interface showing visualizations for various queries. In this case, the article is linked to a particular faculty. On a college campus, you often see news reports of faculty accomplishments and these notices provided another opportunity to investigate if they had or needed a Wikidata record. This one led to the enhancement of a record for a microbiologist at Texas State. So not only is this project helping us get the word out about their research, but also feel more integrated and informed about work going on at our university. Dr. Clay Green in the biology department was upstaged by shoes of the same name on an initial Google search before work was done on a Wikidata item. Two weeks after the Wikidata work, Dr. Clay Green became the first result, which also included a small knowledge panel to the right. Next, Nicole will speak about the impact of the oral history project. Only three of the oral history participants had Wikipedia or Wikidata entries when I initially surveyed. Now all 70 participants have entities on Wikidata with links to our oral history website and relevant collections at the university, as well as elsewhere, as well as to identifiers like NACO and SNAC. In addition to entries for participants, I also created other relevant pages like for the Piper Professional Award in order to be able to link to it. This has added visibility to Texas State University's website, as well as to our archive space instance. For example, with notable position Ruth Bain, we don't have a collection related to her besides her oral history interview, but Austin History Center does. So I added the archives app property with the finding aid link, as well as adding the has worked property with our interview. This links the materials on her together in one place. I do not have great statistics on the oral history website where I was actually linking to in the Wikidata record. But we can see an effect on our transcript item views on our digital collections platform, even though we were not linking directly to it. Links and data collections started in September 2020. I started to learn and create the data model in Wikidata in October, but the majority of the entries were created and edited in December through March. While item views went up, downloads went down, presumably because users were coming from the website page, which gives two options to view the transcript. A PDF on the digital collection site, or a view on another HTML page. We suspect users upon finding out that they had to download to view the PDF went back to the website in order to view it. Because of the difficulty in measuring impacts on the end user, the Whitleth collections project analyzes impacts on the technical user. In order to maintain accuracy and availability, entries are being revisited to assess the need for additional data. Impacting the time allotted for analyzing new collections. Furthermore, highly visible and popular aspects of the collections are being assessed to add value to current entities such as adding more cast member properties to loan some doves Wikidata page. At the forefront of impacts on the technical user is the recognition that there is a need to create more linked data to yield high returns and inspire wide data creation that benefits all users. The inclusiveness of Whitleth collections and Wikidata sparks creating even more access points. Recently, the project began incorporating the addition of info boxes in Wikipedia for individuals having existing pages. Also upon discovering errors in Wikipedia pages for Whitleth collections references, editing Wikipedia pages became an additional aspect of the project. Rather than focusing simply on Whitleth entities, as was the instance at the project's beginning, the project is now also linking existing pages to newly created pages for Whitleth entities, such as the existing Andy Adams Encyclopedia entry authored by a Whitleth archive holding of Wilson M. Hudson, who lacked a Wikidata page. Creating the individuals page allowed the two to be linked as author to notable work and vice versa. The incorporation of editing existing pages into the project allows for the discovery and correction of bot data errors also, such as with author Christopher Cook, whose Twitter data previously belonged to an actor of the same name. Additionally, although publicly available, author Christopher Cook's page lacked the spouse property, even though the spouse had an individual page. Cook's Twitter data was corrected and a spouse property was added to both Cook's and Penisoba's individual pages. For a visual of the Whitleth project's contribution to Wikidata in comparison with all U.S. participants, here's a recent sparkle query limited only to those pages having an archive set property. But as Nicole mentioned, the project often uses the has works in the collection property rather than archive set. Now for the challenges and conclusions. A major concern for Texas State University's project participants is the learning curve. For example, defining properties is often confusing and provided definitions can be unclear, leading to the usage of incorrect properties and flag data. Additionally, headings valid in collections are often not defined as such in Wikidata. As an example, the Texas Institute of Letters has been defined in finding aids as both a society and an award, but Wikidata defines it as only a society. Therefore, while Wikidata allows for the heading to be included in the award's received property, the heading gets flagged for potential issues. Skills have proved to develop over time. However, this leads to a need to revisit earlier created pages to apply these skills. And now Nicole will share more conclusions and challenges. Data maintenance is not a new thing. We have to do that in our silos, too. And with Wikidata, we have to keep up with the changes. For example, we recently launched our instance of archive space, and now we have to go back and edit the finding aid link. The regional finding in aggregated tarot has also moved platforms and the redirect links will only be good for so long. Finally, the differences can be seen as both a strength and a challenge. For instance, we can't do it all, so it's great to see that others can enhance our records. But that's a double-edged sword, as you can't restrict the data added, even if you would prefer for personal identifier information or for privacy reasons. We deliberately decided not to add birth dates to our faculty items on records, even though they appear in the authority data. However, our particular bot did not get that memo, which led it to retrieve the birth date from BIAF and inserted it into the Wikidata record. This type of change is representative of the complexity of working with a large cooperative platform. Addressing questions of sustainability, it can be difficult to find time in an existing workflow to do this work. However, library silo systems also aren't sustainable and are also relatively invisible. What makes the most sense to put our efforts towards? Wikidata exposes our data to the public, and others can build upon it, breaking down these silos, and that's what makes Wikidata a good resource. Some further avenues include adding citations to Wikidata, whether they be books or articles. The library community would also ideally come with how to assess Wikidata work, so we can have evidence that this work is valuable and should continue. What kind of best practices can we develop? We'll also have to contend with maintenance on our links as we migrate platforms. Editing a Wikipedia, whether through info boxes or editing articles, seems like the next logical step in order to make some of the work in Wikidata. Wikidata is being populated in Wikipedia. Finally, a question that's been percolating is how to make the enhancements of Wikidata benefit our databases and make the information round-trip, so to speak. In conclusion, while the learning curve is steep and the work can be time-consuming with the rabbit hole of possibilities, Wikidata can increase visibility for our collections and our faculty members. It allows us to be a part of a forward-thinking data initiative instead of accepting the status quo. And ultimately, doing this work allows us to insert our institutions better. The next few slides have resources pertinent to this project and reading in an information list on Wikidata. Let us know if you have any questions.