 All right, we'll go ahead and get started. Thank you, everyone, for coming to a morning session. I'm sure everybody had a good evening last night, so thanks for getting up and getting your coffee. I'll be talking about what the role of taxonomist is in web development projects, specifically for Drupal. I had the opportunity to give this presentation in Pittsburgh at the US Drupal Con, and it's really exciting to be here as well in Lille. We are a small group. There is a Q&A app, if you want to put your questions in. But I think it'll be very easy to just have some question session live at the end. So let us know if you do want to use the app. Otherwise, we'll just pass the mic around. Great. My name's Michelle Ann Jenkins. I've been doing Drupal work for 17 years and four weeks as of a month ago, so I think this is now eight weeks. I should have updated that slide. But I was originally a programmer working on content management systems and had the opportunity to work on Drupal 5, 6, 7 before switching to more of a information architecture and taxonomy perspective. So my background is in linguistics and WCMS programming. I have over 20 years now of one side or the other of web development projects. And in 2005, I got a Master's of Library Science at McGill to help support that transition to information architecture. I work with a company called Dovcott Studio. We're based in Montreal, Quebec, but our clients are worldwide, including quite a few here in Europe. We do everything related to taxonomy, metadata, search, digital asset management, product information management, and now starting to branch into things like knowledge graphs and ontologies as well. We are not Drupal-specific, but as I will explain, I love it when we get to work with Drupal because it plays so well with taxonomy. So even if you are familiar with taxonomy, this question comes up quite often, so I wanted to make sure we got it out of the way early on. The word taxes means arrangement in organization. Derma means skin. That's why there's taxidermy, and that is different than taxonomy. So we do get quite a few taxidermy jokes in our profession, and I finally went and looked it up and was like, why are these words so similar? That's the explanation. So we're going to be looking at taxonomy today, which is a method of arranging and organizing things. So what makes a taxonomy a taxonomy and not just a list of words or hashtags or freed keywords? The key points are that it is governed. We often say it is controlled. So each concept or idea exists once in the taxonomy with an agreed preferred label, and then it is structured. It has some semantic relationships to other things in the taxonomy, usually represented in a hierarchy. We also support synonyms and alternate labels, so when we define that preferred term, we're not ignoring the fact that there's many words for the same concept or sometimes many concepts that have the same label. We capture all of that richness in term metadata, and you can already start to see how Drupal, which treats everything as an entity, works really well with this idea of having a list of words that have metadata to them. But I wanna focus a lot today on the governed and controlled part because that is what gets left out. So what makes a taxonomy a good taxonomy is a foundation of governance based on practices and standards. So there's an ANSI standard for the development of taxonomies. It's about 45 pages, extremely dry, technical spec, but it is the foundation of a lot of the work we do. We take that standard and then we look at the user needs, the goals of the organization or business, and what the taxonomy actually needs to do from a technical use case perspective. So the governance is the guidelines for how you're gonna synthesize all of these different aspects. Taxonomy is about balancing these different needs as well. Why taxonomists like Drupal? So out of the box, you have better taxonomy management in Drupal than any of the other content management systems, in my opinion. WordPress lets you have categories. SharePoint has its term store. I can do a whole other presentation about issues with that, but Drupal out of the box has the concept of taxonomy and actually calls it taxonomy. It's very easily extensible. It plugs into the overall content architecture really well. Drupal plays well with others. So if you have a rich environment where you have a digital asset management system, maybe you have other content management systems, maybe you have products, Drupal likes to talk with other systems quite easily as well. And as I mentioned, everything is an entity. So it's really easy to ramp up into what we would consider fairly advanced, ontology, taxonomy, semantic structures without writing any code and without having to do any extensions or changes. Drupal designers and developers get taxonomy. They see the usefulness in things like views and display choices and navigation. And just a quick look I took earlier, there's over 250 stable modules that mention leveraging taxonomy. So a really rich world to play with. If you have a good taxonomy and you're using Drupal, there's a lot you can do quite quickly. I'm not gonna go into the details about implementing taxonomy in Drupal. Again, that could be a whole other presentation, but just some common use cases that we see is views. Views obviously use metadata to drive content display and they work really great with taxonomy. You can even start doing things where you leverage the relationships in a taxonomy to say, show me things that are related to something that is related to this. And you can start knowledge graphing your way out through your content and then you have total control over what shows up where. Whether you're making a landing page that's topic-based or whether you're using many blocks to build personalized dynamic content displays. You obviously have your filtering, so your faceted search. You can expose the tags for a piece of content so that people can navigate between related items based on, hey, I picked this because I'm interested in these countries or topics or organizational entities and then they're able to explore many navigational points of entry. They're not just using your navigation. It works really well if you're landing on a page coming from a Google search. Gives you a real sense of what a site is when you see that sort of metadata exposed. More and more we're seeing people leverage synonyms. So if somebody starts talking about amusing things or whimsical things, you can say, hey, that's all funny. We're gonna guide you over to this area here. So you can get quite advanced in how you leverage the display of taxonomy terms. Beyond Drupal implementation specifically, taxonomy is very useful for content strategy planning. So having your editorial calendar, looking at your personas, what topics are people interested in? Okay, how much content do we have for each of those topics? Are there gaps, are there opportunities? Obviously a whole world of opportunity with search ranking enhancements, search and result page display. I wanna highlight that navigation should be separate from taxonomy. They are both hierarchical and I know Drupal lets you just use taxonomy as navigation. But when we're talking about semantic governing taxonomies, that has a very different governance process than your more editorial user journey navigation. So they support each other, but I would say they're not always the same thing. You also have a lot of chances for reporting in analytics, looking at how content performs based on controlled taxonomy, syndicating your content, so sharing it either inside your organization or with other organizations, you can get workflows. So looking at permissions and roles and taxonomy. So quite a rich area to explore outside of your basic display of this content next to something that it's related to. So taxonomists, as we've already covered, we don't mount animal heads to the wall. We often have a masters of library science. We're increasingly becoming more technical. It's a really big deal right now. I'm actually going to taxonomy boot camp, which is the Drupal con for taxonomists in Washington, DC in November, and everybody's talking about how do we get plugged into chat and chat GPT and artificial intelligence and auto classification and semantics. So definitely getting more technical and getting into that ontology and semantics area. A taxonomist should be familiar with UX research. If you went back 10 years ago, taxonomists were really archival librarians or catalogers and really thinking more about information retrieval patterns from a kind of academic perspective, but now they should generally be pretty plugged into doing things like card sorts and user interviews and user testing. They might be familiar with a specific WCMS, but a lot of times what we do is technology agnostic. So we design a good taxonomy and then we work to implement that in a system, but someone like me, I love Drupal, I've focused a lot on Drupal, someone else might be a little bit more focused on SharePoint or some digital asset management systems, but generally taxonomists are not specialized in the implementation side, so that's something to keep in mind. When do you want to get a taxonomist? So I would say that a taxonomist is very useful at just about any point in the process of developing and managing a website. So if you're trying to assess your current state and do a content audit, having taxonomy involved from that very beginning point is really useful. If you're planning a redesign and you're restructuring things, having someone who comes in when you're working on your content types and says, okay, this should be a controlled taxonomy, this should be free text, part of that metadata schema part. If you're implementing a new content type, if you're trying to integrate multiple systems, this happens quite frequently where you come into a situation and an organization has 12 different WCMSs because everyone's done their own thing and it's getting migrated to Drupal and what you have is a whole bunch of different ways about talking about content or countries or subjects and you need to synthesize that into a single taxonomy. When you're hiring a taxonomist, we would expect that they would have masters of library science where it's equivalent or real-world experience. So people come to taxonomy from a wide variety of backgrounds. Some people are coming from the library science, some are coming from technology, some are coming from UX and information architecture. They should have really good communication skills. A lot of taxonomy work is explaining to busy people who don't wanna be there in what a polyhierarchical relationship is or what post-coordination means and being able to make that engaging, don't give people more information than they need, presenting materials and developing training materials that are engaging and explain the value and purpose of taxonomy and tagging so that people want to take the time to do it appropriately. Having domain expertise, so if I work across all domains, I just did some Coca-Cola metadata cleanup and then I'm gonna work on snowmobiles and then I'm gonna work on health and I'm able to jump into those different domains with the help of subject matter experts. But it doesn't hurt to have a taxonomist who already has a background, especially if it's something like medical journals, very technical equipment, if it's an aerospace industry digital asset management, might be useful for them to have that domain expertise. But generally speaking, we're librarians, if we don't know it, we know go look it up, so we're pretty quick to jump into different domains. And then finally that technical expertise. So if you're asking a taxonomist to design something that is an ontology, that's part of a knowledge graph, that's very Drupal specific, it would be useful if they had that background. In some cases, you can have more of that agnostic, they're gonna work in spreadsheets, they're gonna interact with users, they're gonna design the taxonomy and they don't need to be writing code to clean up old mappings between different taxonomies, they don't need to be engaged at that level of implementation. All right, very quickly through the kind of work we do, because each of these bullet points could be a separate presentation. So first off, the discovery and assessment. This is a really big part of any of our projects, partly because of that domain area, the necessity to understand what is going on, who are the users, what is the organizational goals, what is the technology ecosystem, just like any web development projects. We might do things like audit your current metadata, review all the taxonomies you have, how are they being used, how much content is associated with each one, how much does the current taxonomy follow best practices? So this is a sample of a heuristic assessment of an existing taxonomy. It's based on that ANSI standard, but we have our own kind of secret sauce as well that basically says you should spell things correctly through your taxonomy and consistently, and then we assign a score. And that gives you this nice little heat map for stakeholders who don't really wanna get into the weeds, but they can be like, oh, we have a lot of orange and red here, that's bad, right? And we're like, yes, the orange and red is bad. We're gonna get you to a little more green by cleaning things up. So what is the length? What's the longest label you have? This comes up a lot. If you have a multilingual taxonomy and you need to talk about ski dues and finish, it's a very long word. So that's definitely something to consider with multilingual as well. How is that gonna look in mobile? How are people interacting with it? And then some more library and nerdy things like pre-coordination and polyhierarchy. Looking at the strategy, a lot of times people say, we're gonna have taxonomy. Yeah, we'll have some taxonomy, whatever. And that's about it. But really getting into what is your taxonomy need to do because there's different considerations for those different use cases. And here's, I'm not gonna go point by point, but if you're using it for search relevance, it needs to do different things and if you're using it for navigation and browsing. You might have a taxonomy that's completely behind the scenes that's driving an ontology and no one ever sees it. You can do different things with that taxonomy than one that's gonna be in the left-hand column and it needs to expand and collapse and people are clicking on it and there needs to be 12 top levels or something like that. So really defining the use cases and we do this in what we call the taxonomy framework. So if I say we're gonna have a topic taxonomy, you need to think about what is a topic? Is a campaign a topic? Is an event? If you have a world Drupal day once a year, is that a topic or is that something else? Is do countries go in there? If you're talking about Afghanistan, is that a subject or does that belong in a different list? So even these little semantical thought exercises about what is a country? That one's really tricky. That one gets crazy. It seems straightforward. What is a country? What is a language? What is a topic? So having a clear definition. And then as I said, what does it do? Is it flat? Is it hierarchical? Does it need other relationships or metadata? What content types? If you have a biography content type, does that need a subject? Will that have a country? What does that mean in that context? What are the challenges and considerations for development? Is this gonna be hard to get sign off on? Who do we need to talk to to validate this? And then where are we gonna get the words from? How are we gonna come up with that whole process? Here's an example from a project I had. They had seven different websites, public-facing websites, plus some product information management in the backend. And we had to kind of go through and say, is balancing work and family the same thing as work-life balance? Can we just call that one thing? If so, what should we call it? Sometimes it's just abbreviations or spelling. Sometimes, again, geolocation is tricky if somebody's saying Africa and somebody else says sub-Sahara Africa. Can we say that's the same thing? Who's, you know, is that close enough? Is North America, United States, and Canada? No, but maybe that's okay here. And then you've got your internal codey short codes that are not useful for end-user display. Again, more and more trying to be data-driven, trying to use best practices. If you're trying to figure out if you should say internet marketing, digital marketing, or online marketing, you can go look it up and see, wait, more people look for digital marketing, let's make the other ones a synonym. Here's where we were trying to find all the things people might type when they were trying to get to blood test for a very technical system that wanted to call it hematological tests. So if you type any of those things, we wanted to drive you to hematological tests. This is very frequent taxonomous work that's just synonym enhancement and research. All right, this is an example of implementing in Drupal. So when we actually get to that technology implementation stage, there's a schema called SCOS. It's just a technical specification for interoperable taxonomy. So basically saying we're all gonna call the pref label, the preferred label pref label. And then if we wanna turn this into a Drupal schema, we kind of laid it out like this for the implementers. Now, every time we do this, it might be slightly different. We might use different SCOS properties to map to different kinds of Drupal things, objects, entities. Now, here's where I really wanted to talk is that when we go through that first process, you feel like, great, we're done. We finished taxonomy, it's over. But a taxonomy is never done, and just like a website is never done. And it's really important to bake in your governance processes. I get a lot of work out of organizations who built a taxonomy 12 years ago and haven't really engaged with it since. And they're like, we don't know what happened. It was great 12 years ago and we've been putting in everything everybody asked for over the last 12 years. And now it's a mess because there wasn't a governance process. So who owns the taxonomy? Is it the web developers? Is it knowledge management? Is it editorial? You know, that's a big decision of where does that ownership of the taxonomy live? Who gets to okay putting something into the taxonomy? Who do you need to ask about, we're gonna add this new term. Do we need to check it with marketing? Do we need to check it with legal? Do we need to run it by a subject matter experts? Who needs access to the taxonomy? This is a big one. Now, in an ideal world, your content creators go into Drupal and they have a beautifully permissioned, simple form to put their content in and they tag their content in Drupal with the actual taxonomy. That doesn't always happen. You might have content being submitted by external consultants, partners, or maybe just have an organization where nobody except for the web developers go into the WCMS. That still happens quite frequently. So how do they see the taxonomy? How do they associate tags with their content if they're not in Drupal? Are they doing it from a spreadsheet? Do you have a read-only page? When does this happen in the editorial workflow? Ideally, it's first. Ideally, you do taxonomy first when you create your content. You say, what is the topic? Who is the audience? What places are relevant to this? And then build the content from there. Again, that doesn't usually happen. You have the web developer going in and being like, I don't know. I guess this is about lemons. We'll just put that tag there. And then who ensures the tagging is correct? QA and having a feedback loop so that when you see a problem, like we see a lot of over-tagging in these news items, who do we need to tell? What kind of documentation? What additional training? How do we keep improving over time? Oh, and I do wanna say that last part. If you're thinking we're just gonna use auto classification, all of this is even more important. Having that feedback loop, understanding what triggers changes and who's involved, the more automated and the more artificial intelligence you bake in, the more this needs to be clear and have that feedback loop because it's not gonna be perfect. It's not gonna be good at first. It's gonna take a while to get there. So this is just a kind of rundown of the kind of things that you need to document in your governance framework. So where do new requests come from? What kinds of changes? What happens if you split a term or merge a term? What does it mean to deprecate or archive a term? Does that mean it's not tagged the content anymore? Does that just mean you can't tag any new content? How do you tell people things have changed? If you're adding new tags to your taxonomy, who's communicating that to the people who are using it? How do you keep the quality integrity of the overall taxonomy? So the taxonomy has a semantic balance to it and if you end up having somebody over here get really aggressive and build out 12 levels of depth and over here they have two and then you have this weird difference of granularity where they're like, oh, Meyer's lemons is a very particular kind of lemon and over here you're like, and vegetables and you're like, well, that becomes unbalanced. So there's kind of a feel for a good taxonomy that needs to be assessed every once in a while. And then what are technical requirements for the governance? So the permissions around who can add a term, who can view terms and things like that. The training and communication, again, gets left out a lot. People are like, they'll go in and they'll see that there are tags. Do you know you don't need to tag with the parent and a child if you're leveraging the hierarchy in your views? That's a technical decision that then has training implications. Are people over tagging? Are they under tagging? Do they understand what it means to put this content in this field? And then testing and assessing. So there's a lot of ways you can have that heuristic checkpoint for that kind of overall taxonomy health. But if you go back a year after you've implemented your taxonomy and only one piece of content has been tagged, maybe you don't need that term. Maybe you were a little over-aggressive in building it out. Maybe 90% of your content is tagged to a term and then it's meaningless because if 90% of your content is about a subject, it's not distinguishing it from other things. So again, that feedback loop of looking at how well your taxonomy is performing and making changes based on that. This is just a quick shot of different kinds of user testing. This can happen at the beginning early on. So that's a Miro board. We used to do in-person card sorts, which are great if you can get people in the same room. The big pieces of paper and post-it notes and you hear people arguing. They're like, no, those are the same. No, they're not. And then people from different departments have totally different views of the content and then you have to roll them up and take them on the airplane with you. You don't have that as much these days. So we do a lot of Miro boards. There's a tool called Optimal Sort that lets you do tree testing and online card sorting both open and closed and hybrid. So there's a lot of different things you can do at different points in the process. Couple of shameless plugs. The Accidental Taxonomist is kind of the book. There's a third version out and the forward is by my business partner, Stephanie Lemieux. And then last year, taxonomies, practical approach to developing and managing vocabularies for digital information was published. Last year I have a chapter all about search and taxonomy but it's got quite a lot of very practical information and case studies in it. We have taxonomy boot camp November every year in Washington DC. The first two days are training. So an actual boot camp on becoming a taxonomist and then the second two days are presentations by people like me. We get Netflix and Coca-Cola and New York Times and it's very interesting to hear what people are doing with taxonomies. There's also virtual bite-sized taxonomies out of London. It happens about three or four times a year. We just finished for 2023 but they'll start up in 2024 with more of those. And obviously if you wanna learn more, connect with me online. You can go to our website. You can find me around here over the next two days. I'm on LinkedIn, I'm on Drupal org, I'm on Mastodon. So feel free to come talk to me about all things taxonomy. Thank you. Any questions? Yes. Thank you, that was a great talk. I learned a lot from that. I wonder, I wanted to pick up on something you said about getting consensus and the kind of arguments. Do you have any tips for mediating that process or facilitating it happen? I work for a university so it can get very heated. Words are our currency and they mean a lot to many of our academics. So I wonder if you had any kind of practical tips about how to manage that? Yeah, chainsaws just don't put it out. Yeah, I mean it's a huge part of it. And so part of that governance is gonna be where does the buck stop? Like who has the final call? I've worked at universities are one where there's definitely going to be different views from different departments and then you have the student versus alumni versus staff and then the different kinds of staff. So one thing is understanding the governance in terms of who owns a facet. So who owns subject or we've done a lot with government organizations where everybody has a balanced input but some people own a branch a little bit more than other people. So we'll assign a subject matter expert owner for a branch. Hopefully you come up with a way to synthesize things but the other alternative is to throw technology at it and come up with something where you actually display different labels or you have named relationships. So you can say these things are kind of the same. So for example, we have doctors want it to say hypertension but we wanna say high blood pressure. We can actually display in different contexts based on having that term metadata or you could have something like heart attack and myocardial infarction. They're not actually the same thing but most people treat them as the same and we can distinguish that they both exist and that gets into the more advanced kind of ontology side but governance, ownership and really key point is having executive buy-in. Have an executive sponsor as high up as you can who says taxonomy is real, it's important and these people are in charge of it and that seems to work in some organizations. So, yeah. That's great, thank you. Hello, thank you also. I also work at the university, same one. We have a taxonomy in our site and I have defined taxonomies which apply to the research institutes and the disciplines within the school but I also have a free hand tagging that just, this is probably a bit more specific to Drupal so just the free tagging. People just go crazy. Have you got any advice on how to tidy that up or retrospectively look at what's been put in there and we've got thousands. We often throw that out as like fine, you can have free keywords, whatever. Like, keep your craziness over here. One, they're thinking of it as hashtags or they're remembering when you had meta keywords and you had to put every word you can think of so that Google would find you. So, one is a little education about what does it do? What do keywords actually do? Does it drive something? Do they show up somewhere? So, tying it to a clear use case. Using it to look for candidate terms and promote those into your real taxonomy. Like, oh, this is actually a trending topic. This can go there and then the last one is you can have lightly governed. So, don't let them just put it in. It goes into a queue and somebody says, no, we already have that. Nope, that's just a plural. Nope, we don't use that. That's camel case. And so, that's called lightly governed. It does take some bandwidth but it basically treats it as a candidate submission and you say, sure, that's an upcoming event. It's kind of ephemeral. We'll let it be a keyword because that's our use case. So, clarifying what keywords are used for. Trending, ephemeral, initiatives, campaigns, things not otherwise covered by the taxonomy and having that role where somebody just looks through the list once a week and says, yes, no, yes, no, yes, no. So, and you can build a cool workflow where people can see the status of their submissions. Yeah, exactly, exactly. I think there was one in the back. Yeah, you gave a couple examples of, I guess, signs of an unhealthy taxonomy. Oh, yeah. Specifically, over and under tagging. Is there some catch all technique or, I guess, heuristic to quickly find that or is it more just a feel when you're working with? My whole presentation at Taxonomy Boot Camp is about healthy metrics for taxonomy. So, there's two sides. There's the taxonomist, the taxonomy itself, where you wanna look for things like, basically looking at the ANSI standards which says, hey, you should avoid having single children because what can happen is you have basically a single child of a single child of a single child of a single child and that's really useless for people. It's like opening a drawer and inside there's another drawer and then there's another drawer and then there's a fork. And you're like, just give me the fork. So, things like don't have too much polyhierarchy. That's repeating branches in places. Don't have single children. And then there's other ones that are squishier, like, look at the breadth and depth. It doesn't have a real hard and fast rule. You might have a taxonomy that has 30 levels because it's driving some huge ontology or something and nobody ever has to look at it. But if you're gonna expose it on a website, you're gonna say, let's keep it three or four levels. So, there is really subjective and objective criteria. And then the tagging is the same thing where you wanna look at what are the things that don't have any content assigned to them. And maybe it's because the label isn't good or people don't know the terms there or they don't understand to tag at the lowest level. So, it doesn't necessarily mean to get rid of the tag if no one's using it. It means you have to look a little deeper. I'm not sure if that quite answered, but you can come find me afterwards if you wanna talk about this. Yeah, no, I'll answer my question, thank you. But there's, yeah, there's a lot. I wanna get everybody to ask a question. We're a small group, be kind of cool. Hi, thank you for that, it was really interesting. I work for a digital publisher. So, I'm coming at this from kind of tagging and categorizing our article content. We have multiple tags on a maze. We've got category, topic tags, hidden tags, people, regions, organization, likes this, whether like which bank or which person or which country we're talking about. It's quite complicated, but I'm a front-end developer. So, I look at it from the point of view of I use that information for content recommendations and pulling things into newsletters and then GA and everything else. Are there any tools that you know of in Drupal kind of like an accessibility checker, but for taxonomy? So, we can look at taxonomy items or entities applied against a piece of content. So, we could check if it kind of meets a requirement that we're looking for. Would it be a quantitative requirement? Like are you saying, because you can make a view that says show me everything that has fewer than three topics, show me everything that has more than nine because there's something crazy going on. Yeah, so we would be looking at making sure there's at least one category and one tag or something like that, so that we're not over-tagging but also not under-tagging. Yeah, and there's a big question about like, do you require things and then people get mad because there's a reason they don't have the tag versus if you make it optional and then everybody ignores it. So, you could build a view that had some cool logic that says it has to have this or this or this. And then you would just basically get a report that says these are all the ones that don't have a topic or a keyword or a subject or something like that. These are adrift, orphaned content or something like that, would probably be your best. But otherwise just building, I would look at modules that are about constraints. So, you have to have one value, you can't have more than five, things like that. So, those are the two I'd go for. But I can't think of somebody who's built a taxonomy or metadata checker in general. But you could probably roll something out of those guys. Thank you. Any last one before we go off for coffee? Okay, thank you everyone.