 Hello, my name is Hugh Patterson, and I am the Coalition for Networked Information Doctorial Fellow for 22-23 years. I was asked to share a bit about my research agenda, and so I'm here excited to share the work that I've been doing and some of the goals that I have and why I'm doing those. So I want to start off by just saying thank you for supporting me and advancing the scholarly activities that I've been involved with, and it's because of this fellowship that I've been able to do the things that I've been doing this year, and I really appreciate that. The title of my talk today is Enriching Library and Archival Records with Ethnolinguistic and Minority Language Identification, and this is a theme of my research area, my educational background has been a linguist. This is my first year as a PhD student, and I'm a PhD student in Information Science at the University of North Texas. So this year, in terms of scholarly activities, I've taken a couple of courses on cataloging and where we've covered topics like Mark and the IFLA-LRM model, bib frame and RDA, metadata one, metadata two, or talked about the VRA core, double core and mods, and RDF and linked data. And I've also been able to do a practicum at the University of Oregon's Knight Library, where I've been involved with discussions on institutional repository infrastructure. Through this academic year, I've had about 18 academic and professional presentations, such as conference presentations with three accepted publications. I've been busy, but my family and my kids keep me busy too. And I rarely mention my kids and my professional work, but I think in this context, they inform my research curiosities. In cultural heritage preservation, we hope that new generations will engage with resources that inform them about the past. And we hope that they will use those resources to shape the future. My kids provide me an everyday example of new learning and knowledge assimilation. So they give me a laboratory of new and cross-generational information transfer. So it's invigorating to watch them discover the world in new ways, including the use of language. Language, I think, is the metadata of life. It provides indicators for relationship, for community, vitality, for individual and communal prospering. There are over 7,000 languages used around the world, each one with one or more ethno-linguistic communities using those languages. When it comes to language resource discovery, resources about language or demonstrating language, and the acquisition of those resources that are useful for a community to take its next steps in language development, we're talking about many of the smaller communities around the world where maybe there's under 40,000 speakers. And that's the long tail of languages. There's a few languages, billions of speakers, or a couple of languages with millions of speakers, and then there's this long tail. So talking about how language and language resources are engaged with in these smaller population communities, I want to see those types of communities flourish and their cultural heritage stewardship and language development goals reached in the resourcing of those efforts. Information access is a big part of enabling flourishing of ethno-linguistic communities. But discovery tools and pathways to resources are the methods used to lead people into those resources. That's why I get involved with projects in the digital library spaces and with projects in user experience design or discovery. Because those components help give access to and facilitate and encourage the community activities. So outside of language, you'll see in my scholarship digital libraries discussions and ethics discussions and user experience design discussions and discovery process discussions. Of course, in information science, we deal a lot with metadata. So you'll see me talk a little bit about metadata. But metadata is sometimes the object of our research and therefore the focus. But I have been learning that there's perhaps another perspective and that is that metadata is the record of administrative process. Administrative processes have different kinds of goals and different kinds of activities that happen in those administrative processes. And I find that we lead through the metadata we record about resources. And because metadata doesn't exist without administrative oversight, it exists as part of a system, a system full of choices often bounded by user experience, bounded by the user experience that we seek to create or the limits of our funding. So different administrative efforts end up bringing about different sets of metadata shaped by semantics or coverage. We might use fancy terms like metadata quality to cover these divergent scenarios. But it's in this context then that my research comes out where I'm working in between these different administrative processes to expose language resources that exist between current or past administrative processes that haven't fully developed metadata specifically for or identifying a connection to a specific language community. And so I'm looking for ways to move unstructured clues about resources into structured elements. I am excited to engage with about graphs of language identification indicators for the benefit of minority language users. So maybe these are fancy terms, but I want to find ways to enrich structured information from free text and context related to the terminology used around language identification. This involves cross language, like language names are different across different languages. And for example, French spoken, referred to in German as Alamon or in English as French, but in French it has its own autonym. So the language identity information, the information extraction from records, when records exist in multiple languages or different languages, moving from free text to structured text, and in some ways this borders on information classification. So my central question is how do we go from unstructured data to structured data in a context where we want trusted information? To answer this question, I'm pursuing research in the area and the use of SORI and weighted graphs within search processes. And I'm applying this in the context of library and archive catalogs. So I'm looking for language resources that are not designated to be about a community, an ethnolinguistic community or a language. And I'm looking in the parts of the record that would be more free text areas. For an example though, let me give you an example of Karbi. Karbi is a language of northeast India. Many people today who know about Karbi are using the term Karbi, but in the Library of Congress subject headings it's known as the Mikur language. It has a Library of Congress classification number, but that number does not have a URI associated with it yet. And ISO 639-3 standard, which is a standard for identifying language entities, it is referred to as Karbi, and it has a code of MJW. There's a second entity, a related language in the ISO 639-3 standard, that's Omri Karbi. And it's been known as Omri Karbi since 2006, which before that it was just Omri. And it's got a different code to indicate that it's a different entity. Library of Congress matches, has some near matches for Karbi, Library of Congress subject headings, that is, Karbi people, the Indic people group, or Karbi religious practices. However, scholars who might be talking about this language might use the term Karbi Karbakh, or Hills Karbi, or Plains Karbi, and refer to different languages in these ways. Looking at it up more in a traditional graph kind of way of looking at it, we can see that the more cultural components are green on the right-hand side, and the Omri Karbi and its related terms are purple on the bottom, and then the terms related to the language Karbi, it would be in gray. So there's a couple of different things that we could look for if we're looking across records in free text or in identifiers that would determine relevance. When we look at two specific resources, a grammar of Karbi, it's an award-winning grammar written by a student at the University of Oregon, and we look at a particular resource in the Library of Congress. It's a collection of conference proceedings. Well, it is a conference proceedings, it's a collection of conference papers. The status of women in tribal societies, a Karbi and Dimas. Neither one of these resources actually has Karbi in the structured data related to its subjectness. So the subjects there for the grammar of Karbi are descriptive and historical, comparative linguistics, Northeast Indian languages, Southeast Asian languages, Tibetan-Burman languages, typology, but the actual language that it's about, interestingly enough, doesn't appear in the subjects. And the same kind of thing happens in the resource at the Library of Congress. Links are there and my slides will be available from my website. Now if we were to look at, look for Karbi across a couple of different information aggregators in the community of people who aggregate materials about languages, open language archives community is a consortium of archives that aggregate language resources. They have three known resources on Armory, Karbi, and 11 on Karbi, but Glottalog has a few more. Their coverage is a little better. The Virtual Language Observatory, which is the EU-Claren interface for discovering language resources, knows about nine items, but they don't make the distinction quite clearly between Armory, Karbi, and Karbi. And the same thing with the Google Books and WorldCat, but WorldCat has a rather large coverage. When you type in Karbi, you can get about 600 items. So this just kind of shows that the, I didn't check all 600 items, but I did check all 38 of the Google Books to be about McCure, Karbi. So they're allegedly about Karbi, but whether or not they're at the right Library of Congress subject heading associated with them is another valid question. Still out there. So we can see that discovery is impacted by our understanding of concepts and that's one of the things that we, that I'm exploring is how to develop relevance for exposing these resources to searchers who are looking for things about the smaller languages of the world or the languages with the smaller populations. So my name again is Hugh Patterson and you can reach me at iithp3.me. My website's Hugh for us and it's an HTTP, not an HTTPS, but I'm interested in discovery processes, networks and graphs, metadata, digital libraries, ethics, language, and user experience design. And I'm interested also in collaborating with you. So let's do something together. Thank you for supporting my scholarly activities this year and I would appreciate the opportunity to meet many of you. Thanks a lot.