 Welcome everyone to this first seminar, first instance of this seminar on conservation of biodiversity. This is part of the project of Charles on responding to the taxonomic disorder. And basically in a short sentence this project is aimed at understanding and mapping the level of conceptual disorder around biodiversity research in conservation science mostly. And so this is the philosophical part of the project that tries to see what the extent of the disorder is and ways to solve it. And then my part of the research is more historical, so it's trying to see what the ambiguity of the concept, what happened when this concept goes into the so called real world and what happens to people on things and forest in my case. So the seminar is going to have both, hopefully, philosophers and historians. Hopefully biologists, we're trying to get biologists on the hook. We're trying to get lawyers, we're trying to get everybody who's got interests in this idea a chance to come talk hopefully. Yeah, that's the idea. But yes, okay, so yeah, welcome thanks for being here or for being there online. Yes, I hope this makes some sense. This, like I say, I wasn't exactly planning on being here this afternoon. What this is, it's sort of mashup talk of a project that I was working on a little bit already and I'm kind of debating what to do with it, which is the first sort of two thirds or half or two thirds of the slides, which is about philosophical and empirical research on ambiguous concepts. So sort of how do we understand what happens when key concepts in debates actually don't seem to have static meetings. I am there some really cool philosophical work that's been done on this, but I'm actually appeal to a lot in this talk is there's a bunch of really interesting work actually in management studies that is thought about this idea because well, it will see management people are very interested in the idea and the question of, you know, so do we have to all, does everyone in a giant corporation have to agree on sort of meanings and goals in order to enable effective collective action, or can we get by with sort of weird ambiguous vague concepts. Then I'm going to turn to the bit that's tacked on at the end, really something that I'm in the middle of doing right now. I'm going to show you results that I figured out under 24 hours together, which is a nice record. I'm sitting in a company shop in downtown yesterday trying to see if I could make this make sense and I think it kind of does. But it's a question of measuring this ambiguity empirically. So how can we try to understand what's going on here in text and if we can or if we can or can't do that, what does that tell us about the ambiguity and the term more broadly? What does that tell us about our philosophical goals in understanding what's going on? This is a basic idea. The two parts kind of sort of are connected to one another. They make more sense connected together in my head than they maybe do in these slides, but I hope it'll make sense. So, yeah, I'll start with thinking a little bit about ambiguity in science in general and in language more broadly. Because I think there is a really interesting analogy to be drawn between these kinds of large-scale, multi-actor management efforts for biodiversity and management of other kinds of change in the world. I actually think that this angle of drawing this connection to organizational choice studies, organizational change, it's actually really promising as a way to give us some levers to think about. Again, as Max mentioned with respect to his part of this research, I mean, part of one of the things that's really driven me in putting this project together is I'm really interested in thinking about what happens when a concept like this leaves the scientific community. Because biodiversity is obviously not just going to be about what an ecologist on the other side of town thinks that biodiversity is. It's also going to be about what the people at the Maison de Vélopement du Raab think that biodiversity is and what the mayor of Autini du Vananeu thinks that biodiversity is. And this could be radically different notions. And I think there's a really cool kind of trading zone notion going on with a concept like this. And so there's some good aspects, there's some positive aspects to ambiguity, and there's some negative aspects as well that I'll poke at. I'll talk a little bit about taxonomy for ambiguity, so how you can think in a bit more general way about these ambiguous concepts and some promising ways to study it. So people I've talked about a lot with this include people such as Max, obviously. Also Bennett Sturker, a close colleague and dear friend who has done some amazing work on some of the empirical sides of ambiguity measurement. So I really see a lot of what I'm doing is dovetailing into some of the stuff that he's been thinking about. And also a bunch of this started in chance with Oliver Lane who's a postdoc here a few years back on my first project. So that's all been, this has been very collaborative, very collaborative stuff. Okay, so a little bit about ambiguous concepts, right? Biodiversity is weird because it's one of these things that in fact very often we just sort of know it when we see it, right? When I introduce this notion in either philosophy and biology courses, or even when I teach this sometimes to master's level biology students, you start by just sort of gesticulating at stuff that seems obviously biodiverse, like that stuff or like that stuff. Like that's obviously really biodiverse stuff, right? So that's all fine. We're stuck, right? Biodiversity in part because it's a concept that's always had this sort of practical normative valence about it. It's sort of wedged in between a bunch of different demands on it. It has to be more than just contrast some work by conservation organizations which sometimes just focuses on whales or panda bears or something, right? It has to be more than just saving single charismatic big important species because it's supposed to capture something about ecological relationships in addition to just save the panda bears. But you have to not let it totally expand because it's got to be something smaller than just like save everything that is alive because it's supposed to give you guidance, right? It's supposed to actually be practical. And so, yeah, save all the things. It's not like actionable conservation advice. And that's been true since the very beginnings of the concept. I mean, that's why it was introduced. We came up with a new word in the late 80s, early 90s because we wanted a term that had the sort of dual scientific practical sense that the word only dates from 1988, 89. It wasn't in wide usage until 90 to 92. And there's also loads of ways of measuring it in science. So this is actually what a lot of this kind of ambiguity inside the scientific communities, a lot of what people like Beckett have already worked on. So if you're a scientist, and especially if you're a scientist who wants to build large interoperable big data repositories, you need definitions of your terms either to be clear or at least to be specifiable so that you know what sense of a term of a particular data set is invoking, right? And so already in science, biodiversity is one of these problematically ambiguous concepts. The most popular definition is what we just call species richness. That's just basically count up how many species there are at a place, although you fiddle the math a little bit because it's better to have more phylogenetically distant species than phylogenetically close species, right? That's kind of intuitive. So you fudge the math some, but this is a very well understood metric. But there's lots more. There are people who have expressed biodiversity in terms of a diversity of traits or characters. So people have talked about, you know, yeah, what's interesting is evolution found all these ways to do stuff. What's interesting is that, you know, what makes things diverse is, hey, there's, you know, mammals that know how to fly and there's, you know, 75 different ways to solve the problem of nitrogen fixation in the botanical world. These kinds of things where you go, okay, it's about a diversity of ways of doing stuff, of traits. People talk a lot about, now as well, structural or community level biodiversity. So what might make a system particularly diverse is that it has a lot of things connected in a lot of interesting ways, playing lots of different functional roles. We'll talk about ecological niche diversity. That's a bit like talking about the traits or characters thing, but that's sort of like diversity of ways of life, right? Ways that you would draw nutrients from the environment and grow and reproduce and thrive, et cetera. And of course, more and more, as everything in biology sort of becomes expressed in terms of gene sequences, you can express biodiversity in terms of gene sequences if you want to, right? So there are absolutely initiatives to essentially, the one that I remember, there was a giant NSF grant to basically dredge up the entire bottom of the lake behind the Woods Hole Marine Biological Station and grind it all up and sequence it to just see how much stuff, like how much, what kinds of wild DNA's in there, you know? And they found a bunch of stuff they'd never seen before. Yeah. A small question. Yeah. So I was wondering whether in general this notion is, as I would intuitively think it, more so be used for larger species, or like is there, because none of that excludes like bacteria and so on. Popularly, you're exactly right, although scientifically more and more, there are lots of, there's lots of pressure in the scientific community to recognize not large charismatic, we'll keep it with the alphabet, charismatic megafauna, big interesting stuff. One place, for instance, where I've seen a bunch of inks spilled over this is there are a lot of people, I think for very good reason, making a lot of noise about soil biodiversity right now. There's a bunch of stuff that lives in healthy soil. If it dies, there are various kinds of extremely dire predictions about what that means for the ability of things to grow in the ground. So that's one place, for instance, where people are going, there's whole ecosystems here that are an important aspect of this that we have to pay more attention to. And so it's part of the fight about these different kinds of concepts, right, is how can we get people to engage with the right kinds of systems and the right kind of way? Yeah, absolutely. Sure. So there's a couple of ways you could respond to this, right? Big, big asterisks. When I sight-search at all here, they're not defending this view, they're explaining it, they're just telling you about it. Beheading and company do not agree with this idea at all. But one response is what you might call fundamentalism. You say, look, one of these things is right and the rest is wrong. There is a correct definition of biodiversity. We need to go out and sort of corral everyone and get them all in the line about what the right answer is. And so they express this in terms of what they call the definitional consensus principle, which is absolutely what you see in places like Big Data Biology where people basically say, look, we just have to agree about this, or otherwise we're not going to successfully encode our results in these large interoperable databases. And so the design of a formal classification system for expressing a body of data should be grounded in a consensus about the definitions of the entities that are being classified. So ambiguity-bad. Ambiguity is a thing to be driven out of scientific practice so that we can be clear about what we're doing with the data that we have. And this is very old, right? So this goes back to the oldest thinking about what language is for and why we have it. Here's Aristotle for rhetoric. We can start then from what he had said elsewhere in the poetics and the stipulation that language to be good must be clear. It is proved by the fact that speech which fails to convey a plain meaning will fail to do just what speech has to do. It is the excellence for speech that it be nonambiguous. So there's something very deep and intuitive about this which is why I want to be clear that when I'm going to argue and hear it a bit that there are times when ambiguity is good, I absolutely need to be saying that that's a vaguely counterintuitive thing to say. I think the classic default intuitive position is that ambiguity is bad and that the scientific process would be better off if we managed to eliminate it. I think that I just underlined that I don't mean to be saying that that's like an obvious, obviously false position. Another response that you have, one that I don't want to have, is you could just be a skeptic about this. So you could just basically say, well, look what this means, right? The fact that we can't figure out what that adversity means is kind of a garbage word. It sort of means whatever the people involved with the science or the politics wanted to mean. So, Sotrasarcar very famously has argued for this position at some length. Biodiversity is just whatever it is that is the stuff that conservation biologists say they're going to save. And so it's just, I mean, it doesn't have any independent meaning. They're just using it to kind of track their own practice. They're just giving a label of what they already wanted to be doing anyway. You can go even farther. This was a paper by Carlos Santana, the other Carlos Santana, a buddy of mine, a philosopher of biology, really fantastic paper, provocative. I mean, save the planet, eliminate biodiversity, but the idea is eliminate, yeah, this word is in the way, right? It's on what to do with this concept. It's not actually carrying any conceptual load or doing any work at all. We'll all be better off if we just throw this word into the sun and eliminate it from practice. And this is, yeah, this is one of my favorite kinds of paper, right? It's a provocative, well-argued paper for something that I don't actually think I agree with, but it's a really well-written and it's interesting. So I strongly recommend that. So those are your two kind of classic options, both sort of turning around the idea that biology is bad. Ambiguity is bad, right? And so either you can say ambiguity is bad, so we have to fix it, or ambiguity is bad and we can't fix it, so we have to get rid of the concept. What if you went the other way, right? So what if you went the other way and asked, so what if ambiguity is actually good sometimes? And there is an increasing body of literature that says that this might actually be the case. Maybe, though, some of the oldest literature comes out of kind of STS industry science perspectives. So Starring Marie's work on boundary objects, so these sort of entities that exist between domains of science that are sort of constantly renegotiated as people are developing fields. Great classic paper. There are, for instance, that Gene is one of these, which is a great candidate for a thing that, yeah. It depends. What a Gene is depends on who's asking and why they're asking it, right? Sort of what they're doing with their science, how they want to negotiate their relationships to other fields and other objects and other techniques. It's a really nice idea. Lots of other people have jumped in on this with respect to biologically scientific concepts or the epistemic goals of scientific concepts in a brick and canwars. Celsa Natoes are all great papers. A little bit of stuff on publication as well. And Man in Heaven's do a little bit of a launch to empirical analysis to show that actually being ambiguous gets you cited more. So why not? More people will read your stuff. And then, yeah, Beckett has talked a lot about ambiguity being potentially good. This is a brand new paper in the context of big data. He really pushes the argument there for the idea that ambiguity in a big data context could be good. Not just in science, though. There's this great old paper from the 70s by Page where he actually argues, look, ambiguity is really important in politics and does a very technical breakdown of the intuitive, fairly intuitive and obvious idea, right, that like being vague lets more people think you're a good politician and then they like your ideas. Intention-deliberate vagueness is an asset politically. Joya argues to go back to strategic vision statements. This is moving toward the kind of business and management literature that I'm going to pick up in a bit. And lots of others. So there's a lot of these kinds of targeted analyses about ambiguity maybe actually being helpful. I want to shift this a little bit. So what I think distinguishes the way that I want to pose this question is most of this literature at least, and especially in the context of the philosophy of science is about scientific objects and scientific knowledge. Either stuff or whatever, putative sequences of DNA, genes, et cetera. Or exchanges between scientific disciplines, conceptual exchange. How do we facilitate interdisciplinary work? There's a bunch of, for instance, that McMahon and Nevin's paper that has the empirical analysis of citation and ambiguity and it is talking about the idea that maybe being ambiguous is good because it lets people engage with your work from an interdisciplinary perspective. They don't have to buy into everything your field is selling to be able to use your work. But again, I want to push this broader angle where we get out of the scientific community. So what happens, since biodiversity is the kind of thing that also works at this interface between science government, NGOs, public, media, et cetera, et cetera, et cetera. Is there a way we can look at ambiguity in that context? So in particular, can we evaluate the use of ambiguity in what I'll call genuinely ambiguous scientific concepts, places where, like I think biodiversity is a paradigmatic example. It's a bit presumptuous to think that we're going to drive the ambiguity out of the scientific practice. And then what happens when we extend that and look beyond the scientific community? So that's the question that I kind of want to play with here. And I'm going to play with it by, again, looking at this weird literature, super interesting literature. It's always one of those fun experiences as an academic to discover that a bunch of people in a field that you've, like, never heard of or thought of or thought would be relevant are actually all talking about a problem that you're already interested in and think is really cool. And for me, that was discovering that, yeah, management and organizational choice people have been thinking about exactly this question for about 40 years. They've really strongly, at least Thursday, I want to say, it's a consensus because I actually, it would be presumptuous of me to say that I know the entirety of the field well enough to know whether or not a consensus exists. But what I'll say is there's a strong tradition in that literature of arguing in favor of the utility of ambiguous notions. And so here's Eisenberg, 84, the overemphasis on clarity and openness and organizational teaching and research is both non-normative and not a sensible standard against which to gauge communicative competence or effectiveness. People and organizations confront multiple situational requirements, develop multiple and often conflicting goals and respond with communicative strategies which do not always minimize ambiguity but may nonetheless be effective. So the idea is non-normative in the sense of maybe it's not necessarily a good thing to focus on clarity and openness and also we shouldn't be indicting people for failure to communicate clearly what people are doing is communicating in ways that fulfill their goals in the weird contexts in which people have to actually engage in communication in large organizational contexts. And partly when I read this kind of this long last sentence, this sounds a lot like the biodiversity management world, right? This is what this is like. When you look at people fighting about what to do about biodiversity, a bit more recently Jehu here are arguing pragmatic ambiguity is a practical solution to the difficulties of collaborative action in situations where different points of view and conflicting interests could lead to organizational paralysis. And again, what I think is really cool here is like the vocabulary might be a little bit different like you probably wouldn't see the word organizational but this sentence is not that different from what might appear in a social epistemology paper, right? Which is kind of cool. There's a serious overlap here and I think in a really interesting and useful way. So why are they arguing for in support of ambiguity? What is it that they see that's potentially useful about ambiguity in communication in these contexts? So first, these first to come from Eisenberg ambiguity allows for multiple representations of goals to exist despite possible underlying disagreement, right? So we might disagree about means we might disagree about in particular about underlying value commitments, right? But if we can all find ways to represent our goals that at least overlap with respect to some of these ambiguous terms we might get levers for action. It also enables response for change in shifting environments. So your terminology is flexible enough to sort of absorb a little bit of external change. I think we see this especially in the context of biodiversity. I mean, I don't think it's a stretch to say that what conservation looked like in 1993 early in the years of when biodiversity was being developed and promoted as a conceptual tool in conservation thinking that's not what conservation looks like in 2022 now, right? It's a radically different discipline with some, it's responded to some radical shifts in the external environment. Some of those are economic, others are for instance you might find a little but I'm thinking you're not going to find that much discussion of climate change in a 1993 paper on goals for conservation and now that's become such an important axis of sort of external shock to thoughts about biodiversity conservation. Biodiversity has stayed flexible enough to kind of let us absorb that change. A bit from Yarsakowski I'm going to talk a lot more about Yarsakowski at all's paper here in a moment. It lets you sign on to a sort of higher level meaning of a goal without contradicting my own interests, right? I could often present this in joke form. I think it's probably true at this point in 2022 in Europe that if you get 15 people around a table from literally every walk of life from like golf course real estate developers to politicians to ethicists to biologists everyone at the table is going to say that they think that conserving biodiversity is important. That's pretty much a universally accepted statement. I don't even want to say proposition because I'm certain that the propositional content of that sentence is not the same, right? So that's what these able-to-use concepts can let you do, right? They can let you sign on to a goal even if the way that maybe somebody else at the table was signing on to the goal caused your next non-interest. And similarly, and this is a lever that I'll talk about a lot more in a second, perhaps they let us shake out of these disagreements or these differences of interpretation, levels in which there actually is enough agreement especially to facilitate short-term, local collective action. And that can be huge. That can be, that may be all we can hope for. And so ambiguity is an important part of getting us there. I think that can make it really interesting. There are of course bad things, right? And so this is another part of kind of what I'm trying to do with this project is to give us tools to think about how to evaluate instances of ambiguity to see whether or not we're getting more of the good stuff or more of the bad stuff. What's the bad stuff? Well, you can plausibly deny the unwanted consequences, right? So you can say that you always did what you always wanted according to the way that you understood the ambiguous notion and so they said, well, I never meant to cause all this massive harm because it wasn't part of the way that you were understanding the problem. It might reattrench existing power differentials and this is something that, well, it's actually kind of hilarious and there's another long tradition in organizational change literature because it's where the people are is to examine organizational change at universities and reading papers about organizational change at universities is really funny. But one thing that is often underlined in that context is, yeah, how many times can you think of, you know, a university higher up explaining something to you in an ambiguous way where it's clear that the reason that the ambiguity is there is for the person to be able to have the power to do exactly what they wanted to do anyway and not take you seriously, right? And ambiguity is very good at that. And this might be a, this may be a relevant lesson in a biodiversity context, right? You can imagine the real estate developer using the term in a sufficiently ambiguous way to still get to build the golf course, right? Enables proliferation of multiple means which is viewers' action, right? So on the one hand it might enable certain kinds of action and on the other hand it might hide certain kinds of ways forward. You may not be able to see. And then this is a really, really cool article that I wish I had time to unpack here about a theory of organizational change that they call the garbage can model. It permits the appearance of decisions that don't actually resolve any problems. So what you might see happen is you might see a situation where essentially, so we all agree that we need to do something but problem X is sufficiently vague and we move sufficiently slowly that over a period of time we basically evacuate all the content out of problem X and turn it into different problems or move it somewhere else in the organization and then we say, oh well you know problem X is actually really easy to solve now because the term has remained vague and we've just dumped out all the content. Everything that was hard about the problem has been shoved away somewhere else and so we think we've made some kind of great decision to solve some kind of important problem but in fact we haven't actually resolved any of the stuff that had pushed us to want to make the choice in the first place. That would feel real bad like the universities. Right? Yes, we're going to fix the problem with blah and then actually you take all the parts you give them to someone else and then you declare that you solved the issue. Right? So let's get a little bit more tools. How do we think about ambiguous language with a bit more clarity? I'm going to draw this taxonomy out of this wonderful paper by Pauli Arspikowski at all, a group in the UK. We did a really long law student study on a giant change that took place at some, it's anonymous, I don't know which one, some UK business school. So they were looking, they wanted to get a certification. There was all these internal fighting about what was it going to be worth it, what would it mean, why would we want it, and if we're going to get it, how are we going to get it, who's going to have to do the work and what are we actually going to change? And they analyzed, I need painstaking detail. They sat with thousands of pages of minutes for meetings and position papers and documents. They analyzed all the rhetoric surrounding this fight, and interviews as well, they did personal interviews. They did all kinds of stuff. It's actually problematic for me. I'm going to come back to that, right? How can we do that in a context that's dispersed by the person? And they found that this rhetoric shared some characteristics. So first, we have two axes that we can sort of cut this rhetoric up along. On the one hand, people's rhetoric usually tended to be either situated, so particular to the group, right? In-group specific. And so we're going to build it in terms of our position and our interests, right? We're going to define it. We're going to structure the way that we define an ambiguous term in terms of how we see the state of affairs. Or it can be accommodative. And so where you're intentionally trying to express your position in such a way that you're letting in other people's interests. And it really did tend to be that this was pretty bimodal. People tended to be in the process of doing either one or the other thing more or less most of the time, right? Similarly bimodal. The meanings that get ascribed to the concept tend to be either narrow, so minimally ambiguous, usually constructed from a single perspective. Or wide, where you're baking into the meaning of the concept, the idea that there are divergent or conflicting interests or goals about its use. So that gives you four different kinds of rhetoric. And I've even kind of tailored these to the biodiversity question here. Situated narrow rhetoric, right? That's going to be stuff like scientific journal articles or internal corporate reports, right? We're talking to our people in terms of our values and we're defining concepts in terms of how we think they should be understood. We're not talking to anybody else. So yeah, scientific journal work. Perfect example, right? Situated wide rhetoric. Tends to be used when we're trying to argue in favor of one view against our competitors, right? So I'm situated in the sense that I'm coming from my own position's perspective. But I'm trying to define the term in a way that's going to convince other people that my perspective is right. So I have to meet them part of the way in terms of the meaning of the term. But I'm still sort of fighting for my own day. Accommodated wide rhetoric is the big happy vision statement, mission statement, IPCC report where we're trying to say everyone is in the tent and everyone has their contribution to the meaning of this very broad concept. Accommodative narrow is perhaps the most interesting from our perspective. So we're talking about narrow meanings. So we're driving a stake in the ground. We're fixing the meaning of the term reasonably precisely. But we're being accommodative about our goals and our position at the moment to try to let other people in. And so what they argue is it's actually here that you saw a lot of collective action potential emerging. So people are fixing the meanings enough to give us a way to talk more broadly about our broader goals. And that's really interesting. What can you do? Well, yeah, again, empirical observation, right, that they draw from this giant analysis that they did, everybody used all types of rhetoric over the three years rather than converging on one position or the other over time. Because this was where we were able to shift between the types of rhetoric as they saw fit to justify and validate their own colleagues' organizational interests and actions, often adopting positions of each type during the same passage of speech, interview, or meeting. So people are balancing back and forth all the time. Because people are going to play with this even over the course of a long meeting. And so that means that it can't be a question of giving normative privilege to one of these kinds of speech, because we're not going to be able to drive the others out. That's not how communication actually works in these kinds of contexts. What I think we can do is we can start to ask the question about the contexts where those kinds of engagements occur. So maybe we can talk about when is it that we wind up in an accommodated narrow context? What does that mean? When do we get there? What are we doing when we're in those kinds of contexts? And how should people engage in each of those kinds of contexts? So that you can kind of, you know, call a fowl essentially on people as like, we're in this kind of context. That's not the way you should be manipulating an ambiguous concept. There's a lot of worries here. Not at least among them, of course, communication is a three-place relation essentially, right? You need goals of the communicator. You've got the linguistics choices made by the communicator. And you've got the interpretation put on to that communication by the receiver. It's a lot of data. That's a lot that you need to know about communication. It's not clear that we have what we need to study this empirically, data-wise. And I'm going to come back to this in a bit from my last kind of wildly speculative 15 minutes of the talk here. You know, Stern's even argued that this is true in the science. We don't even know what we need to know about what the scientists are doing to be able to look at the ambiguity inside scientific practice. Much less if I want to do this crazy thing and look at what happens when it leaves the scientific community. So if it's already a problem there, I'm just making the problem worse. Stern's tried a little bit in, I think, a really interesting and promising way to think about how we could look at this empirically, how we could understand the presence or absence of some of these kinds of ambiguity in textual contexts, at least. So it is these appearances. Here, it is possible to formulate testable generalizations relating how often and in what context the term is used, the indicative goal being prioritize and circumstantial factors of use, including linguistic and social context available and the background knowledge of participants. That should be a thing that we should be able to do. How to do it is the fun and exciting challenge. I'll jump through very quickly Stern's proposal which revolves around partial synonyms networks. So these are clusters of terms like function, evolutionary function and biochemical function where we kind of know, well they overlap but they don't totally overlap because they're not quite the same but they're definitely sharing some semantic space. You quantify that kind of ambiguity as a kind of entropy measure across contexts and it would look something like this. This is an equation out of this paper. The entropy of an indication in a context is of a network in a context just summing over those uses the entropy of each use in each context times the probability of the context. So kind of a classic linguistic entropy measure. The lower that is, the less ambiguous the use of the term in the network is going to be. As I understand it, the last time we had a meeting about this they fleshed this theory out, they think it's going to work they're trying to figure out what data to apply to and how to get it to work in practice so they're working on it. So now official rank speculation to learn. So this is where the talk goes weird. This is where I started trying to think about how I would analyze this with the kinds of tools and data that I have available to me. What that's looked a lot like lately for me is topic models that's what I've been spending a lot of my time playing with in my recent digital humanities work. What's a topic model? If you don't know what a topic model is a topic model is a tool that automatically splits a corpus into actually something that looks a lot like contexts in Sterner's sense via machine learning approach. So you tell the number of topics you're interested in finding out and it's going to tell you which documents invoke those topics with which kinds of probabilities and which words are involved in those topics at which probabilities. I'll come back to that in a little bit if there's different intuitive ways to think about what a topic model is doing. What are some of the advantages of this? One that I actually didn't put on this list is just that it's actually a very mathematically clear object. What a topic model is in the end is just a set of probability distributions and so we can do lots of really nice math on such as buckets of probability distributions. This is a thing we can really play with which is part of why I like it. Once you've asked the computer to derive all of these distributions for you you can start to fiddle with stuff. They're document level rather than word level. So that might contextualize things a little bit. Of course for those of you keeping score at home you may also remember that Fischer-Polsky at all just argued that actually people switch what kinds of meanings they use and what kinds of context sometimes very quickly. I don't yet know what that means for a method like this. It's something that I am just, I think, going to have to get a feel for empirically. But in lots of cases I think it's fair to say the overarching use of a context of a given meaning of an ambiguous term in a particular published piece is probably going to be fairly static. Except in cases where people are engaging in explicit comparison or trying to use more than one context at once. Like I say, I think it's an empirical question whether this approach can pull this apart. Documents in big topics so topic membership is probabilistic so that's really nice. There's lots of nice well-known again because these are mathematically robust objects there's nice analyses of robustness and results. And topics tend to be interpretable and this I think is really nice as well. I'm not going to dwell on it right now. But one problem that I think that you might have with an approach like Stern's entropy approach is you're going to have a bit of trouble interpreting what a context is. If a context is just sort of maybe it's a window of words around the use of the term that you're interested in that doesn't really tell you like what is it? What are they talking about when they use biodiversity? Topics tend to actually give you the results of the topic model tend to give you a really nice handle on that. They really do tend to look like topics in a colloquial sense. They tend to pick out this is an evolutionary ecology topic and so papers that are high probability for this tend to be about evolutionary ecology and so the machine learning results really seem to grab something a bit more a bit more comprehensible. So the intuition that I want to play with here is that diversity appears in lots of different topics that seem like they're about wildly different things in a corpus but it seems like either each of those invocations of the term is carrying a different meaning and then we could explore those by unpacking the topics or the term is serving as a bridge between different topics. I don't know how to tell those two things apart yet. But let's start with just a little basic question what I've been playing with lately is how do you quantify this? How do you start? So another intuitive way to think about what a topic model is doing is that it's inducing a kind of k-dimensional vector space over a corpus and that vector space gives you at least a locally optimal model for each document as a vector in the space. So you're defining you're deriving with the Gibbs sampler usually. You're deriving both the basis vectors and the vectors for the documents at the same time. That's why it's a complicated machine learning approach. I'm not telling it what the basis are in advance. It's figuring out the basis at the same time that it's situated in the documents. And then each basis vector is just a property distribution over every word in the corpus. So you're thinking about every document as being a mix of these topics where each topic is a distribution over the words and that's what gives you words in your document. It's sort of like a really weird model for how you would write a paper as being like iterative sampling from probability distributions and then inferring those probability distributions as though they were actually how the papers were written. So what can you do with the results of one of these? Well, easy things are say, cosine distance between either the basis vectors or the vectors for the documents. That answers a question intuitively like how similar are the topics or how similar are the documents? That's a very easy it's cosine distance, it's a very well understood actually there's five or six other distance metrics for distances between probability distributions like this. So there's lots of different ways you can choose to measure how far apart these things are. Another thing that's very easy is to inspect the content of the probability distributions themselves. So there's two kinds of distributions, there's two kinds of questions. How important is a given word for a given topic? So in topic number 7 how important is the word how probable is the word biodiversity? That's just a conditional probability. And the other question that's very easy to ask is given a document, how important is a particular topic? So how many of the words, how probable was it that a word from document number 19 came from document number 7? Very easy, very easy questions. Things that aren't that are progressively more difficult. So this is kind of your palette for thinking about simplistic analyses on the basis of topic models. And so I don't know, here's a guess here's an entirely speculative model for a way that you might do this. You want to ask something like the question to what extent is a word used in topics that are radically different from one another? And you got to start somewhere because you're in a non-oriented vector space. So you pick a central topic T and so I decided entirely arbitrarily that we could call that the topic in which that word is most likely. So what's the topic that talks most about biodiversity? The topic for which the probability of biodiversity is the highest. And then we can compute the ambiguity over all the topics. The probability of that word in that topic times the distance times similarity between the central topic and the topic at issue. So distance between topics times likelihood of word in topic. It's like the simplest thing that could possibly work if this was like the most basic thing I could think of. Largely because proving that anything works in digital humanities from first principles is almost impossible so my goal is to start the simplest thing that could possibly work because maybe I can justify that to an angry reviewer. Anything more complicated than this is going to be very hard to justify to a reviewer too. What do you get? Oh, I should also say, so I wanted to play with this, right? But I don't have a corpus that I think would give interesting results for testing biodiversity on. That's for lots of reasons that I'm going to come back to in a little bit. One thing that's very wide open for me still is, so what is the right corpus for exploring biodiversity with these kinds of methods? I don't know yet. So I just sort of scrolled through my hard drive until I found the topic model that I already had. In this case it was the entirety of the biology part of Proceedings of the Royal Society from 1907 to 2014. I happened to already have a nice 75th topic model of that. And so I just played with stuff and hey, nice. You get results that kind of seem to intuitively track what we would expect. So I just picked some terms that ran them and I thought that seemed to make sense. The ambiguity of commas is higher than the ambiguity of evolution, which is much higher than the ambiguity of the term nucleic tracts what I would have pre-theoretically expected. Evolution is used in all kinds of different contexts in biology. Nucleic is not. So this sort of jives with the kinds of results that I thought I would see. It's a long way from being anything very meaningful. But that leaves me with a bunch of questions that I don't know how to answer. So I'm going to hand weigh that. Yeah, is this justified from first principles about these probabilities? I have no idea. I have absolutely no idea how to make an argument that this is actually a kind of thing that I want to track. Is it sensitive to the choice of distance? I should say distance measure, not similarity measure, although they're just the inverse of one another. Is it sensitive to the choice of distance measure, or is it sensitive to the choice of central topic? In my very limited testing, no. You would expect it, I think, based on the way the distributions were constructed, not to be sensitive to the choice of central topic. That's analytic. Similarity measure is less clear, but it seems to give the same kinds of results, the same orders of results, regardless. Does it fail in some important way that I haven't thought about yet? Probably. I do not know. I have been just ludicrously tired for last week, so this algorithm, this talk could just be a fever dream at this point. I don't know. But I think there may be something interesting here from a technical standpoint. But now it's back up. How would you apply this? Whether my method or Starter's method, let's ask this question, what good is it to quantify ambiguity in these kinds of contexts? I'll play with that question before I talk about what's on this slide. I do think that that's important in part because the intuition that there is ambiguity here is often completely unexplored in these literatures. I mean, it really is. It just sure feels like it must be that we must not all mean the same thing. I think being able to really demonstrate that empirically would be really nice. I also think, and this is where I'll come back and make a little hand wave in the direction of these topics being interpretable. I think if we can use an empirical analysis to teach us about the relationship between these contexts, I think that's really interesting. I don't entirely yet know how that looks. I mean, these distance measures are going to be relevant in my context for my approach. I'm not entirely sure how that looks, but I do know that having a little bit of an empirical assist when I want to break down one of these analyses would be really helpful to me. Getting a little bit of an idea about where to look would be cool. Lastly, there's a big problem still right here, and that's what corpus should I throw at this. And this is problematic for two different reasons, at least. Important reason number one is it's just not clear what text is the right text. It's just not clear how to get enough people talking about biodiversity in enough different ways to make it an interesting objective analysis. I mean, I suspect that this is one reason, to be perfectly frank I suspect that this is one reason that this kind of project looking at this concept at the science society interface has never really been conducted. It's much easier. I actually have this corpus. It's much easier to go collect 43,000 articles about taxonomy. I have those. That's fine. So if I want to know how taxonomists think about one of these concepts, I can ask that of the scientific literature, scientific meetings, publications, presentations. This is all a well understood area. Where do I go to ask that if I want to try to figure out what happens when that leaves the scientific community? It's not clear. So question number one is very fundamental. Question number two is a little problem number two, I should say. It's a little bit more technical. And that is if the documents are really different if they're written in a really different way and there's a sort of worrisome trend that topic models turn into basically type classifiers. Because the computer is very good at noticing that internal corporate reports aren't written like scientific journal papers. And so if essentially all I'm getting is not a measure of differences in contextual use of a term, what I'm getting is how distant does the computer think the style of a corporate report is from the style of a journal article. That's not really meaningful. So I don't really yet know this is, I mean for me this is something that's just I think I just have to try a bunch of empirical tests. I think I just have to play with stuff to try to find out whether that's a surmountable problem or not. And it may not, it may just be that there's nothing here but a sort of type classifier. And so this method doesn't work if you leave science. It works if you give it a bunch of reasonably identical journal articles. But if you get any weirder than that I'll tell you any. So with that I think that is that's all I wanted to say. So Shall we take five? Let's take five. Yeah, it's weirder than I thought it was. I guess I have several questions but maybe I can start by the last thing that you mentioned about what kind of corpus would you choose and for me maybe it's more complicated but for me isn't it just a matter of choice? It's a matter of choice and accessibility to having that corpus and also for me would maybe intuitively be a matter of interpretative choice of saying I'm interested in this sort of question and this connects with perhaps I missed it but when we quantified ambiguity as we saw with the example of evolution chance and gene then there you compare between three different concepts but here we have something different, we have biodiversity compared to what I'm having is this a matter a way of for example could we quantify change in ambiguity over time for example or could we compare disciplines and say actually technical forestry it's much less ambiguous than scientific forestry for example or conservation biology it's much more concrete than evolutionary biology so if it's a matter of a specific question that you had in mind wouldn't that have solved the question of corpus? Yes Okay, two really good linked questions sorry I forgot to get my notebook yeah so good so let me take them actually I think it makes sense I'm going to take them in turn so yeah on the one hand I mean of course in a pragmatic sense you're obviously someone right obviously part of what's going to try out the decent part of what's going to drive this is just what can I get my hands on what makes sense but in terms of that broader question about okay so isn't it just about kind of what would the desiderata be and I think that's what's why I pulled this slide this slide back up at the end and this will segue nicely into the second half of your question if the point is to try to figure out how ambiguous and in what kinds of contexts are people being ambiguous about this term that I do at least want the corpus to be broad enough that it captures some of these importantly different meanings right that it has not just the way that one community thinks about the idea or writes about the idea more important more precisely but then it has how multiple communities write about the idea so that we can try to detect well how differently do they use the word when they write about it now that gets to your second point this I wasn't clear about it also I mean that's very nice I should say more precisely what I did there and why the reason that I showed those couple of test numbers was not because that's what I think I would do in this context it was just to say that it seems like it intuitively tracks ambiguity as I might expect that to appear in that particular corpus so what would you do with that term in this context or with that metric in a biodiversity context as you say you got one concept you don't have a lot the hope would be so on the one hand the absolute value is already interesting right because that's a bit of an empirical handle on this question of is it actually ambiguously used or as it turns out is there less ambiguity here than we thought and for me that's still I think an open empirical question at least in the way that it might appear in writing for me that's still a bit of an open empirical question it may be that it may be that there's more if you will practical contradiction than there is linguistic, textual, conceptual contradiction right it may be that even actually all of those people around the table who say that they want to conserve biodiversity they may actually all even agree about what biodiversity is and just not be willing to act on it in the same kinds of ways I mean for me that's still an open that's still an open question so there's part one part two is I do think and this is where I'd have to play with this and I don't have any good ideas about this yet I gestured at this really briefly but too briefly so let me say more I would hope that part of what being able to quantify this would let you do is let you be able to unpack and compare the contexts that contribute most to ambiguity and so you could look at the different you know terms in this sum right and try to figure out where is it that there is that there are in fact you know where are their uses both where the left hand turn it's really important to a topic and the right hand turn a topic is way far away from the way that you normally use biodiversity where are those contexts which ones are they you also are absolutely right you can do this over time and that I think is really important there's a couple of different ways to do that I could get technical you can either use a regular topic model and break it up over time or you can actually use dynamic topic models where you actually let the values of those distributions change over time which is cool either way I think over time is really interesting you might expect a kind of well it's really to do another question that I've always been interested in exploring in DH context which is about kind of knowledge diffusion and so you might a priori sort of expect that it would start less ambiguous because it's going by a group of scientists wanting to do sciencey things and becomes more ambiguous that's being released into the world biodiversity might be so weird but in fact that's not the case right maybe that it was coined ambiguous and stayed ambiguous because the biologists were already already had as we've talked about this but I was already had these practical goals in mind and so it just never wasn't ambiguous it always had a billion different meanings even in the scientific community that would be a very interesting empirical question to try to try to get a handle on and so yeah it's not you're right it's not I didn't mean to say that it's just you know sort of the value of that equation for the word biodiversity is in itself not I mean it's somewhat interesting but I think it's more like once you have a quantifiable method you can start to play around the edges and pull out pull out threads and hopefully look at some more fine grain structure here it's what I would want to do anyway I don't know if it would work or not but hopefully that's helpful you want to we could we can balance or you can keep going it's up to you no I'll put someone just a quick question but maybe because I didn't completely follow I mean I really like the idea of trying to measure or quantify the amount of ambiguity to distinguish a certain word or concept my worry again which maybe I've found was that what you're measuring here may not be the ambiguity but more the generality of the term interesting obviously if the term is very general you're going to find it in all kinds of topics and it's going to be widely used for the high probability very good that's really important and it kind of goes with what you just said it feels a bit strange if word would be common and be used by various topics just by that effect word would become more ambiguous it seems kind of yeah yeah yeah yeah yeah no that's really nice and I think this is a place where well it comes back to it's actually there's a sense in which that's I could have expressed this point in those terms as well right that is to say something serving as a bridge between lots of different topics but always meaning the same thing it's just a general term right it's not an ambiguous term and I think here is a place where I think there's really nothing to say but look you've got to back up you've got to back up your digital stuff we're close reading here I'm always at pains whenever I, you know, it's good I wouldn't have wanted to go whatever two hours on YouTube without making this point because every time I talk about DH I feel like I have to underline it you know I am not I don't think almost anyone is these days not advocating for throwing out close reading and I think this is a place where you have to do exactly that I think you're going to have to unpack if you see a high a high punitive ambiguity I think you're going to have to unpack what that means but also again I'm hopeful what I would like out of this is help knowing where to look because that's, I mean in essence part of the issue here is that this is just sort of an unbearably vast idea of a project and so that really is a lot of what I want the computer to help me with is just what's the right point I suppose you would answer exactly the same thing because let's imagine it works so you have some kind of metric of ambiguity that works and contextual are just not too broad not too narrow it's not a mess it works but there's a part in the literature of management that you talk about that is really important it's about the strategic use of ambiguity so there's some kind of intentionality you want to convey something and the author decide to be how much ambiguous it's not just a community they find blah blah blah it's also the individual writers trying to do something in the text and I suppose for now we have no idea how to do that automatically but the close reading will probably give it the one thing that I can think of and I do think that this would be important the one thing that I can think of is as well as correlations across time if this gives you a way to track uses across fields I mean one thing that would be very interesting at least one way to start to get a handle on that would be if you could see for instance if within what feels to you like a phrase this if within what feels like to you a single kind of disciplinary tradition or a single way of approaching the field you see really stark mixes across different documents of different topics that invoke biodiversity in what feels like different ways then you might then you might sort of think there's something rhetorical going on that's not a necessary condition it would be a way to detect it could be a clue yeah it's sort of an empirical evidence that people are not sort of taking on they're not taking on a definition as part of their disciplinary presuppositions playing with them as it makes sense but you're right imagine if you don't see that that doesn't rule out that there's not that doesn't mean that there's not something strategic going on anymore and that's always hard and that's I've written about this elsewhere I mean one thing that I think is of course important as well it's not as though that problem goes away if you're just talking about journal articles right? scientists are engaged in very serious rhetorical pursuits when crafting papers if you are in the game of persuading the other colleagues of something ambiguity is a tool and you use it voluntarily it's not just my communicates it's part of the game because you want to convince the others that gene blah blah blah right? and our colleagues and sociologists show that ambiguity is part of the toolbox to convince other people that you're learning yeah that's a big that's one of the if I'm remembering rightly it's been a while since I've read it if I'm remembering rightly that's one of the the things that's important about that's mentioned in the old Griezmann paper on Boundary Objects is exactly that kind of usage it was an impart a rhetorical play for kind of jockeying for position you know the wars over will molecular biology take over biology that's kind of part of what's a play there as well yeah absolutely and so no I think I think this is very important and I think one thing that I mean this is part of why in some sense I am I am I think the ancient history so far on this stuff is very weird because I have right now a lot more work at the meta level than I do at the practical level I've spent a lot of time trying to think about why use these things and what are they really going to tell you because I'm scared about these kinds of questions frankly and I think I think one thing that we need and I think this is a lesson that we learn from our colleagues over in linguistics is I think we need a better understanding of how do we get from how do we get from language to concepts and what are the rhetorical goals of scientists and crafting articles and all these kinds of structuring questions about what's the real content of a paper and there are a number of actually radically incompatible theories in the you know science studies and a little bit of field science community about like so why do you write a journal article what's it for what are people trying to do and of course the answer can have nothing to do with communicate true propositions about science because then scientists wouldn't bother to read papers and does anyone who spends time in the lab know that the journal club is a very important part in the weekly life of any science lab right like scientists spend ludicrous amounts of hours weekly reading articles and so they've got to be getting some propositional content out of them but then what else is going on less clear and I tend to think and this I mean this fits exactly with your question I tend to think that our answers to that question are actually going to have a lot to do with our answers to what kind of philosophy what kind of philosophical content can you get out of doing difficult stuff on science and I think we don't spend enough time I think we have a bit we often have and I happily accuse myself of this too I think we often have a bit too naive of an idea about the kind of propositional content of a science paper that is just going to tell us about how scientists think and what they believe and how they work and it's just we just kind of take the terms of face value and crunch the stuff out and we get the conceptual content and I'm cognizant of having worried about not being the case and then of course that's in science it's I mean if you say have a corpus that's got a bunch of I don't know IPCC reports in it or something right that's going to get really weird if you ever read segments of those but they're rhetorically very strange objects they have to speak their own weird language for good reason I mean and a level of caution that not even the most cautious scientist would take in writing a journal article a phrase that expresses every level of credence in the proposition that follows it like little markers for like I am 62% confident in P and they can write that in their they can write that in IPCC reports like they have this whole table of probability values and phrases literally literally like not even the most uptight scientist is going to write that carefully in the book Risk Expertise that I edited there's one of the economists that choose this language that's a few pages about and it's super weird really weird and now they passed less numbers, more words they discovered that people react better to words but how do you show very probability substantially probability partially probability wild it's wild and so yeah I mean I am fully open to the possibility that like if I drop one IPCC report say into a topic model it just explodes because like it does not know how to the rhetoric is so different that it's like you have this cluster of everything else and then like this one document like by itself off in the middle of nowhere right that is actually really possible and I you know I like I say I don't know how to handle that try it and see what happens I like to be confident before I start the project it might work but I don't know how I did it to just try and see what happens so I have a lot I guess comments, thoughts and questions that we can see what is interesting first a very like maybe an application of what do you say in the talk itself so I was quite curious about this small point in the citation you had where you used or the author that you quote used to word non-normative or as non-normative and I mean at least from and you translated that as not being good or something or not being useful for normative reasons which is not at all what the word means as I understand it so it seems to be some ambiguity ambiguity going on maybe I misunderstood the citation but that seemed clearly normative whatever was there as I parsed what they meant I think what they mean is as in not being a norm it is not a norm of good communication it should not be held out as a norm yeah but that's something else I think that's what they mean though I think that's what they mean as I decoded the article anyway I spent like 10 minutes with the paper when I was first reading it trying to make sure what is the difference between non-normative and not the sensible standard because it seems that you say it's the same thing I think maybe non-normative is a future directed and standard is past or it is past evaluative I'm not actually sure this I would have to have I have an acquaintance who knows this literature better than me a 30 hand acquaintance who I've been meaning to talk to more about this stuff but yeah very fine that's my decode anyway it's certainly not how we mean it and this was just like half a joke but but ambiguity in practice I have more substantial question that relates to previous questions I guess so indeed what you track sometimes will be generality rather and maybe you can filter that out in some ways but there's something also something that you don't track in ambiguity it seems and maybe the one kind of ambiguity that's more dangerous which is where people really are talking about the same things they are really using exactly the same words but they mean something different they have deep debates on like typically philosophy when you have deep debates about concepts this literature people from the different parts of the debate they will interact and use exactly the same words to be able to interact maybe even like negatively this person feels that mental properties are associated to causal stuff and so on but like in a coding way the pro and contra papers are going to have exactly the same syntactic profile exactly that's kind of ambiguity that is maybe more dangerous than somebody applying biodiversity more to operational stuff and somebody else looking at it from a very theoretical perspective those people will use completely different words but they are not deeply or not problematically ambiguous they might sort of deeply refer to a very similar concept but do it in a very different way and that may be the kind of ambiguity that actually is positive and so yeah yeah no like for instance a good example a short example of like if you fed a giant metaphysics corpus at this thing the ambiguity of the term property would probably be very small very very very very very small because everybody I mean of course no agreement about what it is but everybody talks about it the same way we all have the same we all have the same argument with the same words yeah that's a really important and a very nice point all I can say is now this is a hunch and I think so what's bad about this is this is a hunch now so given that I've just spent like an hour saying that part of the point of what I'm doing is that I am tired of having hunches that I don't know how to confirm let me give you a hunch that I don't know how to confirm and that is my hunch is that the debate in these contexts is just not sophisticated enough to converge in that kind of way I mean of course you would detect it empirically right I mean it would it would appear as a like weirdly small level of absolute level of ambiguity in the sense that I developed here and I mean that's a possible outcome I find it very unlikely and again unconfirmable hunch untested so not really I want to say unconfirming I can give you a low value or the presence of that low value would be vague evidence that this thing was happening I tend to think that any corpus broad enough to capture the kinds of phenomena that I'm interested in will be too diverse to converge sufficiently but I think you're right it's actually a very interesting point and it's something that I should think a bit more about at a higher level of generality that is to say this kind of methodology is just going to be rankly inapplicable to certain kinds of dispute and I think it's a good idea for me to sit down and try to characterize more precisely like what kind of dispute is that and maybe this is this literally just flashed into my head that I've not yet thought about this at all but like maybe I could try to do something like driving a wedge between a notion of ambiguity and just a notion of disagreement so I might just want to say that so what philosophers are doing is it's not being ambiguous but is disagreeing right and maybe there's a profitable way to distinguish those two phenomena and to be able to test them that seems it's the robustly being sure that you're seeing one not the other though it's the interesting challenge you're exactly right I need to think harder about a kind of it didn't occur to me just because it never crossed my mind that the kinds of corpora that I'd be interested in would even possibly have this feature although as soon as you say it like I said philosophy is a great example where obviously it's just none of this would work at all I have more questions than maybe other people want to now I could maybe add to what you say perhaps philosophy is a very special case in comparison to the biodiversity because of course you're going to have papers and debates which debate what diversity means but I think and this is also a hunch but I would say most of the uses this is a particular use of the concept of biodiversity they're using it in a particular term so you're not going to see perhaps ambiguity within one of those specific topic because the debate is at least at least it's not explicit so there is one specific use one specific I would say concept of biodiversity so it's not use mention is gone in these kinds of analyses too of course which is a notable problem right? I think it's not as antagonistic as philosophy that's what I'm trying to say you obviously know the literature much better than I but this seems impossible to me I mean that it's a philosophy more than in that field of course but imagine these people don't having this kind of debates while biodiversity you have to treat it as objectively as possible and other people I say you got to look more into the effects of it something like that they do exist but I feel like it's more this different kind of article I think more like opinion pieces or reviews discussed that you're going to have a lot of very technical papers but that connects to a nice point about corpus construction that I hadn't thought of that actually excluding that kind of thing is probably a very good idea it's not going to be easy because sometimes it's not under the title of review article but I think at least in the literature I've seen you do see roughly separated those two guys that's a very good point but when it comes to social sciences they debate the sense of term much more maybe in a certain kind of biology so I'm not sure it's science philosophies maybe kinds of science moments of science I hand waved at this things are going to get weird if you have the same authors in the same papers using multiple contexts of this kind of thing mixing everything together review articles may really be a mess some of these models I don't have this ability in the kind of system that I use some of these models actually have it set up where you can look at you go look at the text and actually ask it so this particular instance of the word biodiversity which topic did you think that got pulled out of for that one, right there you can sometimes I've seen papers where people have produced figures that look like that every word with a little box and a number around it it's like oh that's kind of cool I am not technically in fact I'm not exactly sure how that is supposed to work I do not believe that the kind of inference method that is implemented in the software packages that I use gives you that level of cream yeah I have kind of a wild suggestion of how to they also trade this and then I don't know whether it makes any sense it can't be any wilder than this algorithm that I didn't really came up with yesterday it's much wilder and probably more intractable but still computationally operational so the idea would be that these concepts have a historical development they are not standing on their own in a paper and maybe you get a much more nuanced view on the ambiguity if you're going to look at the history of the word like to which papers do they refer and how is the what are the other words associated to it in those papers and then like track a line or a tree or something like that historical tree to sort of find the conglomerate of of usages that are associated so I imagine that if somebody uses widely diversity in a sense that is sure it's usually because they refer to a certain literature that also uses in that way and that it tracks down to probably some congo at some point but it has like it spreads wildly and I guess you can find the ambiguity by the distance not the sort of quite coarse-grained distance that you refer to but like at more fine-graining by looking how far they are in this like wild genealogical tree of the development of these conglomerate I like this idea there will be one problem and I don't know how strong it will be this is again an empirical question one thing that I have so I haven't done a large amount of citation network analysis someone with Luca yeah that's what I thought yeah so one thing that I have heard people say in this context is there actually is an intuition that we all have that's usually wrong and that is citation networks are a whole lot sparser than you think they are everything's a lot more isolated than you think which is kind of surprising you know we all I mean I share it too right you have this intuition like co-citation networks shouldn't be that hard to get to a last common ancestor for a paper ought to be like pretty close right harder than you think the network there's a lot less overlapping citation than it seems like there is which is weird if I use a concept I usually refer to people who also use the same sounds right and so I mean I don't know whether that would be enough of a problem to pose a problem right I don't know if that's actually an issue that would make this not work or not it might not be to the extent that it mattered um but that's a really cool idea one thing that this connects to something that I didn't talk about here I thought about putting it in and I took it out um I am interested in thinking about so one thing that kind of went away in here right was individual documents uh and I mean you have that level in these in this representation you can go back to documents somehow and I'm not yet sure like this is a possible way to do that an impossible interesting and fun way to do that because I'm not sure how to connect the kind of stuff that I've been hand waving at with you know so say I want to get inside about particular papers or the kinds of things that you already knew and you didn't need this out this kind of approach to be able to tell you which is you know what are the most important papers in this subject in this journal in this time you know that's usually a pretty easy thing to already know if you have the background knowledge that you need um and so that could be a fun way to kind of reconnect to to kind of the individual document level stuff um I'll have to think about that the best thing about biodiversity is because since the topic only dates from whatever 1990 you can get pretty good citation data for the entire history of the topic which is the word which is pretty cool you know I can't do that with most of the other things I work on um yeah I mean citation data is garbage before about well it's really garbage before about 1950 because you always didn't cite the same way that we do now citation didn't mean the same thing but even with all the caveat that you just said I would be curious yeah correlate your analysis of context metric and length of co citation network just to see if there's something that looks see something yeah as an explorer how so how to poke around and see if somebody's already played with this kind of thing because I know there are people I mean at the super macro level I know you can do similar kinds of things you can cluster the citation networks and basically you should do it for you know all of science and you roughly see disciplines form as islands in the network so I mean at some kind of macro level we know that this works which has to make one thing that somebody would have tried to make it work at a more micro level too um very small citation network are used to to see the impact of our research our particular research of the brain of two sub-disciplines inside a very narrow discipline but but I don't but I would be curious to see how it's correlated to your context analysis because maybe you can see it would be quite anecdotal but still in the diffusion of something that is not very long because you're right these trees are not as big as you would think if there's a complete change of context or not I mean this is something that I have wanted to do for years and just haven't gotten a technical handle on yet and I think one thing that is cool about the kind of work that I've been doing um so I've been working with full article texts which most people don't do because most people didn't bother getting the copyright access negotiated um what I haven't done that I think would be really neat is to think about how you can sort of induce networks of meaning and similarity from text and compare them with the induced networks from citation and I think it would be really cool I don't think anybody is playing well I think obviously people are playing with this but I think it would be really neat I think it could help you answer all as you say answer all kinds of really interesting questions about what happens in these cases of knowledge diffusion um and I just haven't well I haven't gotten the uh the citation analysis prowess up to speed yet that I would want and in very clean quarters yeah there's a problem if it's not clean before because people could in standard way mistake about names you need clean corpus and it's the kind of thing where you're going to want to be able to um after a lot of experience with very dirty corpora because most of my text is nasty um I think it's fair to say the way that I usually describe that is uh corpus cloninus doesn't stop you from drawing results but it does stop you from or dirty corpus it does stop you from drawing conclusions based on small samples size and that's exactly what you want to be able to do in that kind of context you know either to say that you know there's exactly four of these in the corpus and that's meaningful or there's more to the points I can never do because my corpus is too dirty I can never say that there's none of something because I never know whether the absence of something is a noise a noise artifact I would never go on record in saying you know nobody in 1948 talked about the term X in the journal Y because I just don't believe my data well enough to be able to make a categorical claim like that um and that's exactly what you yeah as you said you need that you need to be able to make those small n inferences on the other hand if you work full text and you have a dirty corpus you can still do some kind of simple analysis to convince yourself that even with the dirty corpus it's not garbage right but for citations there's it's not big enough yeah if you miss you know an important link between two communities that there's a diffusion of the article of 1905 because this is the example I have in mind oh yeah where they prove that point out it has no impact at all so even if you discover something about nobody will do but these are small networks so if you miss two links at strategic time because people put two n for a name compared to one n in German yours you need a king corpus yeah yeah and again and you're trying to prove a negative so of course it better be shiny yeah yeah no and that's I mean it's always been one of my kind of constant background challenges it's been playing with messy data because I just know that I'm not going to get anything better so you know it is what it is yeah no no it's a it's a question I just want to finish it so I wonder what or could ever be the purpose of defending a concept that is sufficiently ambiguous because if I think about what are the first principles of philosophy the Cartesian principles is to render concepts or thoughts or ideas as clear as this thing is possible so we consider that a philosophical virtue so if you defend something that could be sufficiently ambiguous to use in this and that context so either you're changing your first principles of philosophy or you're doing something other virtuous you were not convinced by the list of the good well I mean no I can already ask is there a question that you don't might be related like first what is wrong with defending biodiversity as the number of species per square meter and say that everything else should have another name yeah no and I mean the short answer is nothing I mean I think there is nothing wrong with that as a move and there are a bunch of very credentialed credentialed biologists who absolutely would want to make that kind of a move and that's on the one hand I don't see anything wrong with that on the other hand I think actually what I want to do is I want to take the other or if you're delighted I do want to say it in this case I mean I'm not bringing a classic traditional kind of conceptual analysis of biodiversity here I mean I don't think that's what this project is about in part because I think the pragmatics of engaging in debates around biodiversity right now there's no I want to say this I'm not I don't think as pretentious as I know that I am I'm not pretentious enough to say that I think I'm going to convince everyone involved in 21st century biodiversity protection to you know after reading one or 12 papers by me to adopt the sole correct definition of biodiversity and so in some sense this is a and in that sense this is not traditional philosophy this is something with a different collection of virtues right I am I am forced to respond to the facts of the ground and the facts of the ground are that this seems like again with the caveat that I think that I take this to be an open empirical question seems like a problematically ambiguous concept as it's actually being used and that's analyzable right even if it's maybe also lamentable and I mean perhaps fair I mean I you know you could just lament this state of affairs but it's analyzable and comprehensible state of affairs and so I think you know the kinds of conceptual analysis tools that we have as philosophers and scientists I think they can be deployed against ambiguous concepts and so yeah I think it's worth it's worth a while to try you know at the end of the day might it be better if could wrestle everyone into consensus it might I mean I will I will give you that but I still think there's an interesting and exciting analytic work to do in the absence of that kind of consensus I mean there could be so interesting ambiguity to describe as there are many concepts like analytic truth or knowledge but I don't think biodiversity there's no I mean there's no reason yet to consider biodiversity to have an ambiguity of that I think it could still be very much dissolvable of what you want to represent with it I think part of the argument that there is an interesting ambiguity of that sort and this is that I didn't talk about today part of the argument that there is an interesting ambiguity of that sort is that interesting well I'll put it this way I think interesting ambiguities of that sort are induced in concepts when part of what leads to those ambiguities are value differences I just take that from the values of science literature over the last 25 years I think that's actually a really cool way an interesting philosophically analyzable kind of difference in signification of concepts when they're induced by underlying value differences and a kind of background behind the hunch that I have that in fact I do agree with the hunch of everyone else that the term is ambiguous behind that hunch is a hunch that there are these kinds of underlying non-epistemic value disagreements over how the concept is being used and that I think is the kind of thing that philosophers are really good at taking apart and can contribute to eliminating maybe in the end I think you've got to admit if you're in my position that maybe in the end either as I already said maybe it's less ambiguous than we think or maybe what happens when you see ambiguity and you explore the contexts where there are ambiguous usage you actually just find people saying lots of really dumb things about biodiversity I actually think that's actually a very positive outcome I'm not entirely sure what I would do with it yet if that actually is what's going on I think that's actually quite possible just kind of non-sensical stuff but we have to figure it out first I think the first step is to understand map explore it and then think about conceptual interventions conceptual engineering consensus building etc we have this for a time I don't know if I'm giving you time well it's all two but maybe you are I don't know you're the president okay thank you