 great. So I'm Joy Loftrand. I'm a postdoctoral fellow at SOAS with the University of London and happy to be able to host this linguistics webinar today. Our speaker is Dr. Nala Li. Nala is the assistant professor of English language and in the English language and literature department at the National University of Singapore in US. Before that she had done her PhD at the University of Hawaii at Manoa and also spent a year at Stanford University as a postdoctoral fellow. Nala's worked extensively on the languages of Singapore including Babu Malay and Colloquial Singapore English and has more generally worked in language augmentation and description, linguistic typology, contact languages, and language endangerment and has agreed to share today some of her most recent work on the spatial network properties of endangered languages. So Nala, thank you for agreeing to be with us today and to share this research still in process. We look forward to hearing what you have to say. So thank you for the invitation, Joy, and thanks for organizing this in general. Let me share my screen. Okay, begin over here. So okay, so hi everyone. Thanks for coming to today's talk on spatial network properties of endangered languages clearly with my day job at the documentary linguists and as a career list, I'm not an expert at network analysis. So this talk was only possible because it was born out of collaborative efforts with my colleague Dr. Cynthia Sue. So I think I should acknowledge her name over here. She is a psycholinguist and a cognitive scientist who utilizes network analysis in her work on cognitive structures. So she mainly works on the mental lexicon. But we were talking and our broad interests align roughly in the study of language patterns. So she decided maybe we could apply our tools here to see if this was meaningful. So we came together to do this work. And here we are interested in the spatial patterns of endangered languages. So Cynthia wasn't able to make it today, but we are excited to share this work. So apologies in advance if I do not handle the methodology questions as well as I should be. They can be directed to Cynthia at the email address that you see over here. And then I will try to present it as simply as I can as they were introduced to me a couple of years ago when we first started to get together to do this work. So it's kind of interesting what happens when you go climbing together and you end up coming with paper like this. So this is a result of that. Okay. So let me try to contextualize what we are trying to do here today. I think it needs no belaboring that language endangerment is occurring at an unprecedented weight and magnitude. So you guys most probably know this, but it's worth bringing it up each time. So while the issue of language endangerment and had been broached as early as the 1940s by Maurice Waddish, there is no denying that language endangerment is now occurring at a more unprecedented rate. Also because we are more aware now of recent data that allows us to kind of establish more accurately the rates at which it occurs. So of course we are familiar with the numbers 50 to 90% of the world's languages becoming more abundant by the end of this century. But there's also more recent data from the catalog of endangered languages, which suggests that one language goes dormant every three months or so. So I use the word dormant here to mean languages that have gone extinct in the last 50 years or so or languages for which we are not as sure if there's still a last speaker or we're not sure about the demise of that last speaker. So regardless of rate, we are also aware of the immense consequences of language loss for the speech community and for humanity. It affects the social society of the speaker that they're well-being. It also affects linguistic diversity and knowledge for humanity in general. So over here, there's no belaring it, it's kind of an important issue. So over here, we hope to shed light on the issue in particular on how these languages pattern spatially. So we can explore spatial relationships between languages with the crucial assumption that came out in the work of Linda Bronham, Felicity Meakins, and their co-authors. So the assumption here was that languages that cluster in space share similar sorts of social realities in their shared environment. So these would be socioeconomic and historical factors among others. And the approach here that we take is a quantitative one involving computational spatial modelling. The data that we utilize come from the catalog of an ancient languages known in short as LCAT, which is meant to feature all languages that are known to the LCAT team to be at some level of risk. When we were utilizing the data in it, there were about 3,423 languages featured on LCAT. So the current version has got 3,456 languages. If anyone's interested in the changes between then and now, it's about an increase of 23 languages from the time we started utilizing this data set. So we were motivated in some way by the gap that there clearly was in terms of what had been previously done. So while there has been work that situates endangered languages in space, much of this work took the approach of demarcating the languages by known regions, for example, by treating the languages in Australia as a group or by viewing the languages of New Guinea as a natural group. So the issues of language endangerment have also been conflated together with the talk of linguistic diversity and understudied languages. So this was done in the work by people such as Greg Anderson, for example. Well, and it's also been subsumed in discussion of bio-cultural diversity by people such as Lou and Harman in what appears to be a twin-track model of loss even. So the loss of languages then implying the loss of cultural diversity, which supposedly occurs alongside the loss of biodiversity. So discussing these issues in the same breath is not a bad thing at all. It's a good thing. It brings about its benefits as language endangerment then gets a lot more attention since the reality is that the endangerment of other entities such as mammals would be much higher up on the radar of most people than of languages. So when you talk about these things together, people start paying attention to what the issue is. And when tackling these issues, shared resources are technically an advantage in a world of limited resources. Yet at the same time, there's also work suggesting that these concepts can be decoupled and that doing so would allow us to uncover a lot more about these patterns of endangerment. So Tervy and Pettarelli, for example, show that while there are significant positive relationships between language richness and between language richness in human populations as well as species richness among non-human mammals at different spatial scales in New Guinea, there's also significant negative correlation between the distribution of threatened languages and threatened mammal species. So perhaps we want to kind of bring apart those concepts and just look at language endangerment on its own accord for a start. So in this paper then, we attempt to do that, strip away everything else, look at language endangerment hotspots on its own accord, we do not pre-demarcate areas of endangerment but we rather allow the data to speak for itself by using quantitative computational means. So the methodologies utilized in this paper fall under the branch of network analysis. So it might be good to quickly talk about what networks are like in this framework that we're utilizing. So very simply, you can think of a network as a structure that represents a group of entities and the relationship between these entities. So that's what's represented the entities and the relationship between these entities. So as drawn in this diagram, this simple diagram that I've coughed up, a note here would refer to the entity that's being analyzed and an edge which is the technical term used over here for those connecting lines between these notes would represent the relationship that we are interested in analyzing between these entities. So for example, if you are studying the social media network of Instagram or even TikTok or Facebook users, then the notes would represent those users themselves, Instagram or TikTok or Facebook users, whereas the edges would be the following relationships, for example. So that would be what the social media network looks like. And if you are studying how COVID spreads, then each note might represent individuals and the edges then can represent the domains of interaction, such as the sphere of the family, the school, the workplace or even the marketplace, among other things. So any sort of network can present itself for analysis, including social networks as I've mentioned, traffic networks, networks of the mental lexicon as in Dr. Cynthia Sewell's work. And over here, we're interested in a spatial network. So and there are many ways in which these notes and relationships can be analyzed. So here we are concerned with the spatial network of managed languages. And we have constructed the spatial network using two pieces of information that are crucially featured on the catalog of managed languages okay. So these include the level of endangerment that each language is at. So it's endangerment status. It also includes the languages geographical coordinates in terms of its latitude and longitude. So each note as you see over here would then represent the location of each language and the edges, oops, and the edges that you see over here would represent the half a sign distance between latitude and longitude. So half a sign distance is something that's usually used for calculating paths for calculating geographical distances between various points on earth because it's also known as the great circle distance. And it represents the angular distance between two points on the surface of a sphere. So it's good for analyzing these locations of languages spread up across the globe. So very quickly, I think it's important to introduce you to the source from which the data is taken for a better understanding of the nature of the data. So the Nanget Languages project is accessible online at www.nangetlanguages.com. So you may already know about it but if you don't, it's a platform that was conceived initially and still does. So it was conceived allowing the sharing of information and resources on the Nanget Languages. So this was a project that was initiated by google.org, not google.com but google.org. It's kind of supposedly philanthropic branch. But now it's kind of fully run by linguists and people who are stakeholders. So we are just interested in safe languages. So the languages that are featured here are languages that are known to be at some level of risk. The platform has in general been useful for raising awareness about language endangerment where it occurs, what it looks like in various places. And users are not only encouraged to access the information, the samples that are provided by partners, they are also encouraged to submit information or samples in the form of text, audio or video. So I guess like when it was conceived, it was thought of as a kind of a data sharing or a shared platform for everyone. And one of the reasons why was that it was meant to be continually updated. So if people knew that there was something that was inaccurate on the website, then they could write to people who are handling the project and that information can be updated. Of course, everything subject to review by the area experts. So one of the central features of the Nanget Languages platform or ELP is the Nanget Languages catalog that I previously mentioned. So known in short, it's LCAT by the team. So LCAT's aim is to provide up-to-date reliable and comprehensive information on Nanget languages. So one thing that struck us when we first came up with the idea of doing this was that there are a lot of resources available, different sorts of websites, databases available, but a lot of them were not very well maintained. The information from them was taken from the 80s, numbers as well and census numbers and so on and so forth. So we kind of wanted to provide something that was more up-to-date. And it was then produced by a team at the University of Hawaii at Manoa with a team from Eastern Michigan University and with grants from, of course, the National Science Foundation and States, as well as Wills Foundation. And the initial technological assistance was rendered by Google.org. So the platform is currently managed by the First Peoples Cultural Council, a team at UH and a governance council, with the data being maintained mainly by the team at UH. So a feature of the, well kept then importantly to us for this paper is the language and engagement index or LEH as we formally called it because it came out of Hawaii. No, it was a coincidence that we called it LEH, but it came out of Hawaii. So and this was developed for the purposes of LCAT and it's supposed to establish the extent of an engagement for each language. The reason for doing so is that let me, let me get to that a little bit later. So some of you may have come across ELP, but just in case you haven't, I find useful to share what it looks like on the website. When you access it, you view a map of the worlds. And the ancient languages represented as dots. So here I kind of have a view of the area of Asia surrounding, like just because, but you get, you get worlds map. And these different colored dots correspond to various levels of risk that the individual languages are facing. As long as we have the geographical coordinate, we can kind of put it here. As is clear from this, that's, that's one piece of information that we have for many of these languages, it's geographical coordinates. And depending on whether that information is available, more regarding the languages level of vitality or endangerment because these levels of vitality and endangerment are kind of like two sides of the same coin. And information about the social circumstances are also represented at different levels on the site. So what do I mean? So if kind of hover, not quite sure, you can see my cursor over here, but if you kind of hover your cursor around Southeast Asia, around Malay Peninsula, then you might see this pop up. So that information over there on Barba Malay that it's critically endangered and that it has 2000 speakers, that information pops up. And then when you click on the individual languages, you get a more extended set of information, including how the language fares on particular scales. So over here, we have domains of use, we have speaker number trends, and we have intergenerational transmission, whether or not the language is being passed on to younger speakers. Then of course, there's other sorts of information as well. So I kind of had to kind of screenshot this and fit it in nicely. So there's a little bit of information that's missing, but you can see that if that information is available, there can be information such as whether or not there are young children who speak the language, so zero, whether they're young adults speak the language, zero. So and that the elders that speak the language mostly. This language also has, for example, no government support and no institutional support, then there would be other sorts of information that this is populated with there. There's also qualitative information where qualitative information that's helpful for establishing its level of vitality is available, that kind of qualitative information would also be available. And of course, if people upload their video clips or their audio clips or information about the language, then you can see that information at that level as well. So but importantly over here, then are those three skills that you see and the speaker numbers because that helps us establish the level of engagement or vitality on the language changing human index. Okay, so and because one of the major data points of this study, not only is this geographical coordinate, which I can't show you how it looks like on the database, the other major point of data point that we're looking at is the level of engagement that each language is at. So we have to kind of broach out the language engagement index or lay works in a very general way. So why did we have to come up with an index when there are other kinds of ways of assessing language vitality out there? We found that a lot of these methods did not required very specific types of information that we did not necessarily have on many of these world languages. So and some of these methods of assessment did not allow for the languages to be compared to be looked at in a comparable way. So we wanted to be able to come up with something that could be utilized regardless of the amount of information that was available. So what do I mean by this? So lay is meant to provide a level of engagement for any language based on four criteria. So these four criteria are not only deemed to be important, but they're also found to be much more easily available than other types of sociological information. So the data has also got to be comparable on a scale for the kind of work that we were doing. So when my colleague John then we and I were developing the scale, we found that there were other types of sociological information that would clearly affect levels of engagement such as language attitudes. So there's no doubt that language attitudes that's very important when you're talking about the viability of the language. But at the same time, that was one piece of information that was not readily available. If there was any information that was most readily available, it would actually be speaker numbers because a lot of linguists who were doing work much earlier on did not say very much about the sociological circumstances of the language. Then they were relying on census reports that were made available. Gowing census reports or census reports carried up by other agencies. But they didn't describe a lot of these descriptions with lack of information such as language attitudes. And language attitudes at the same time are much harder to quantify. So for those reasons, these are the four criteria that we are looking at when we talk about lay. So the first of these four criteria would be intergenerational transmission without which we know that there's kind of no viable future for the language. So taking a leap out of Fishman's notion of disruption to intergenerational transmission. Because of how important it is, it's doubly weighted on our scale. Then absolute number of speakers is yet another important factor. For the reason that I just mentioned, sometimes there's no information on the language itself except for speaker number because of the census reports that are available. And while some people might kind of question this because they might say that there's strong intergenerational transmission in small communities and there could be possibly weak transmission in large communities, there is no denying that very small communities are much more at risk than very large communities. So speaker numbers are featured here. But to mitigate that, we also have speaker number trends itself. So when we are dealing with speaker number trends, we are interested in whether numbers are increasing, decreasing or stagnant. So based on a few reports over the years, if you have that information. So finally, we have domains of language use again following up on one of Fishman's initial ideas. Here, the assumption is that languages that are used in more domains are less threatened than those used in less domains. So if you use a language at the marketplace and at work and at school, that language will most probably outlive a language that was only used in the home domain. So that's kind of natural. So the language that's being assessed is scored on these scales and the final telly score corresponds to a particular level of endangerment ranging from safe to critically endangered with vulnerable, threatened, severely endangered as labels that are in between. So what the lay does so differently from other assessments of language vitality is that not all factors have to be utilised given that perfect information, it's perfect knowledge just always is often rare. But rather a certainty score can be given based on the number of factors used. So if you, I'm not going to go in more into it, but if you're interested, there's a resource over there that I've cited. So note also that dormant and awakening labels that are used by LCAT, but these are not scores that are operationalised on the scale, but they're just, they're there. Okay, few. So that was the data. So using that data with the geographical coordinates and with the endangerment statuses, we then constructed a spatial network of endangered languages computationally. So and we were, we analyse this structure at three different levels. So at the macro level, we were interested in what the broad patterns were. And we undertook this investigation by looking at assortative pattern mixing. So I will elaborate on what these individual types of analysis mean later on. But for now, I'm just going to quickly run over the types of analysis that we utilise. Then at the meso level, we were interested in where natural clustering took place. And for this, we utilise a sort of community detection analysis, so where natural communities were to be found. We use the Louvain community detection method. So I'll again elaborate later on. Then at the micro level, we wanted to find out much more about the location of each individual language that was at risk. And for this, we undertook an analysis of closeness centrality. So again, I'll elaborate later on, and then we can talk a little bit more about what the implications of this measure of closeness centrality mean for other things like linguistic diversity. Okay, so where macro level analysis was concerned, again, looking at assortative mixing patterns meant that we were looking for a bias and failure of connections between network nodes of similar characteristics. So the question simply put is looking at how similar your neighbours are to yourself. So if I'm in a very severely endangered language, is my neighbour equally severely endangered or is it a less endangered language? So here, the characteristics that we were interested in was the level of engagement. Positive assortative mixing number would usually mean that what usually mean that nodes are surrounded by other nodes with similar characteristics. Then a negative number would mean the opposite. So in the case of our spatial network of endangered languages, the positive assortative mixing pattern would mean that languages tend to be surrounded by languages that are similar in terms of level of engagement. And if we had a negative number, it would mean that languages would be surrounded by languages that do not share similar engagement values. So here we actually find evidence of positive assortative mixing. So with a positive correlation coefficient, and this was a significant value. So all of this was done in R and Cynthia could quote for this. So what this means is that critically endangered languages tend to be surrounded by other languages that are critically endangered. And then that languages that are less endangered tend to be surrounded by other languages that are less endangered. So with this, we actually had natural groupings that occur. So what the positive assortative patterns in our structure points to are these natural groupings, like languages to like languages. And this actually indicates the prevalence of a language and the engagement hotspots around the world. So but of course, we want to know what those hotspots look like or what it looks like on the map. So this is, this is where our next analysis leads to. So then at the meso level, we wanted to uncover communities of notes that were more interconnected within those communities than outside of those communities. In other words, we wanted to know where the natural class rings could be found. So the Louvain, again, the Louvain community detection method was utilized. We found 13 communities and these communities did not necessarily, were not necessarily directly associated with the continents they were on, for example, but we'll look at what they look like very soon. So the community sizes, the third, their communities as small as 11 languages to as large as 550 languages. And these communities had a modularity structure of 0.777, which indicates that the community structure is overall very robust in Asia. So this is what our 13 communities look like. So for a little bit of color and not so many numbers with this. So from the left of this map and sequence of kind of how we've numbered it, we have concentrated areas of an ancient languages that correspond to areas in West Africa. And then we have another natural clustering in Southern Africa. And then yet another natural clustering in East Africa, number three. And number four, we've got languages within Western Europe, Middle East and Northern Africa clustering together. So these are the ancient languages in these regions. And number five, we've got the southern parts of Central Asia, the southern parts of East Asia, South Asia, and mainland Southeast Asia clustering together. It's kind of a big one. Then where am I? Okay. Then we've got number seven, the Philippine islands grouping together with Melanesia, Western Melanesia, and Australia. So this is yet another natural clustering that occurs in terms of the patterns that we see where vitality statuses are concerned. And then Eastern Melanesia stands on its own, like the languages within their pattern on their own. Then we have yet another region within Micronesia, Polynesia, New Zealand. We've got one in North America, and parts of Northern Asia. So it kind of overlaps over there. And we've got a region in Central America, and then we have a southern part of South America and the northern part of South America. So these are individual clustering that naturally occur based on how these language-enagement values pattern. Then the different stacks that you see here at the bottom of the map show the different compositions of each community. So what you can see is that these have different compositions in terms of how these languages are, with the darker colors corresponding to higher enagement values and the lighter colors corresponding to lower enagement scores. And then if you see the grayed out portions, those would be languages that are dormant. So usually languages that go on extinct within very recent years. So very quickly, if you were just to qualitatively look at the data, what we see is that some communities are far worse than others in terms of having proportionately more languages at higher levels of risk, enagement in others or having more languages gone dormant recently than others. So the communities that have more dormant and critically endangered languages include those of the southern parts of South America. So there's a huge one over here where lots of languages have gone dormant recently. And actually one more has gone dormant recently. I think Yamana has gone dormant recently as well. So this proportion over here should actually be increased. Then the other ones that seem of note would be the Philippine Islands together with Western Melanesia and Australia. So that's yet another critical region. And then we've also got North America and parts of Northern Asia. And so what our meso level findings show is that the notion of hotspots is a valid one. So lots of people have been talking about language and engagement hotspots. But we are interested in seeing if these kind of correlate with the ones that people have brought up in the past. So Kraus, for example, said that North American languages and Australian languages seem to have been severely impacted by endangering. But he also said that languages within Central America were very effective. So our observations support the first statement, but not the second one that he made. Then Lowen Harman says that languages in America and the Pacific are most at risk. And we kind of do have that going on over here. And it also supports research that locates the top five language hotspots in the Northwest Pacific Plateau in North America, Central South America, Central Siberia, Eastern Siberia and Northern Australia. So these are all valid regions that were intentioned of we were to let the data quantitatively speak for itself as well. So it's not to say that languages in the other regions are not worthy of attention. They all equally are worthy of attention. But what's happening over here is that the languages are clustering in strange ways. And the alien values in these clusters seem to be higher. And perhaps something's happening in these areas that should warrant further research. So then at the micro level, we are interested in the nature of each language's location. So we kind of coming down from broad to meso to micro closeness centrality is used as a measure of a micro level analysis. So what closeness centrality does is that is used to quantify the relative distance of each endangered language to all other languages in the network. So what this means simply is that entities with a high closeness centrality score would refer to languages that are centrally located in the network. So they would have lots of connections in the network. So they are very centrally connected. And because this is a spatial analysis, and we are in the edges represented the relative distances in terms of this geography, in our spatial network, the languages that have high closeness centrality would refer to those that are also geographically centralized in terms of how it's kind of connected with all these other languages. And entities that have a low closeness centrality score would be languages that are located on the periphery of the network. And in the general scheme of a spatial network, such as the one that we've constructed, this would also refer to languages that are geographically isolated. So immediately we can kind of look back at the map and look at what's happening over here. If you look at these ones, they are particularly, there are some over here to get as well, but these ones are also highly isolated in terms of how, in terms of what it's doing geographically. So we carried out a one-way and over between groups and over, comparing closeness centrality and engagement statuses. And this revealed that languages found at periphery of the network were more likely to be critically endangered than languages that were found at the core of the network. So we carried out, so this was significant. And then we carried out a further post hoc comparisons using the two key tests. So this also showed that the more geographically isolated languages also tended to be more critically endangered. So again, we were to kind of look back at the map that was generated. Notice that the region most highly affected by language engagement recent times the South American Southern cluster is characterized by such highly isolated languages in a geographical sense. So these languages also tend to have extremely small speaker numbers. So among these, for example, Kauesca is the only remaining representative of its small language family that's only spoken by about 10 people or so. Tehulche is a token language spoken by nomadic hunters who will go and the language only spoken by three speakers in Argentina today. So then early on I mentioned that one of the languages that was critically endangered when we were doing this work but has become dominant in recent times. So you might have read about it. So the last speaker of Yamana was known to have passed on recently. So that's that's how things are in this cluster. So finally, one of the last analysis that we could undertake using the micro level measure of closer centrality is one that investigates the role of linguistic diversity. So the question we ask ourselves was whether or not the relationship between the spatial characteristics of an ancient languages and their indigenous statuses was still whole after accounting for linguistic diversity. So we wondered if our spatial network could take into account linguistic diversity and it could be a significant predictor of engagement statuses. So we ask these questions because there's often an assumed overlap between level of engagement and level of linguistic diversity. So for example, 10 Hills 1992 paper and language. I think the title was on indigenous languages and safeguarding diversity. So there's the assumption that seems logical but perhaps ought to be tested that you know and the instrument of languages equates immediately to and the instrument of linguistic diversity. So here the spatial characteristic that we utilize as a predictor was the micro level indicator of closer centrality. The previous one that I just used while linguistic diversity is operationalized by the number of unique language stocks including isolates in each of these 13 communities that were returned by community detection analysis. So I ignore the contact languages over here because those are a little bit more questionable where classification is concerned or people would yeah be up in arms if I did the classifications a little bit different than what they expected them to be. So language classifications over here is derived from the information that's immediately available on LCAT. So ordinary regression is carried out so a very simple regression analysis is carried out and within this simple regression model we found that languages that are found in more central locations and in less linguistically diverse regions are associated with better outcomes. Another way of envisioning these results or recasting the issue from a different angle is to perhaps interpret it as this the opposite. Languages that are found closer to the periphery that means more geographically isolated perhaps and in linguistically diverse regions tend to be associated with worse outcomes. So that was that. So to me it was kind of interesting that my co-PI and I found the results to be interesting in different ways. So as a person who was viewing it through that language endangerment lens it was that language endangerment and linguistic diversity were unquestioningly positively correlated. But for Cynthia my colleague she was interested in the computational side of things and she was kind of pleased with the results because even after accounting for linguistic diversity the structural properties of the spatial network were still significantly associated with the measurement outcomes which kind of attest to the robustness of the network that we constructed. Importantly and I should have perhaps stated this somewhere but I have not stated over here. Linguistic diversity was only a strong predictor when contextualized within the spatial network that we constructed within those 13 communities. If we were to randomly assign regions and use linguistic diversity as a predictor in a network that we artificially constructed with regions based on perhaps the continent the language is found on then linguistic diversity was a less strong predictor. So what this means is that it has to be contextualized spatially when we are looking at a factor such as linguistic diversity. So with that I presented findings from quantitatively analyzing computationally analyzing our spatial network of the native languages at a macro, meso, and micro level. At the macro level we are able to state affirmatively that linguistic hotspots are prevalent in the world with like languages clustering with other like languages so with in terms of their indigenous statuses. Then at the meso level community detection analysis using the Vubain method identifies 13 communities of languages that naturally cluster with each other each with differing compositions of languages with different vitality statuses. We also note that among these the southern South America cluster was most at risk. Then at the micro level using closeness centrality as a measure we find that spatially isolated languages are more critically endangered. At the micro level still using the measure of closeness centrality to represent the spatial characteristics of the network. We also perform a separate analysis that look at the role of linguistic diversity. So we found that linguistic diversity alone could not fully account for native levels but that the role of linguistic diversity was only significant when understood in relation to the spatial structure of the native languages network that we had constructed. So again a testing to how robust the structure was. So of course the next logical question then that stems immediately from research such as this is that why do these indigenous languages pattern the way they do? So situating language arrangement research within the broader spatial context hopefully it's useful for helping researchers figure out what the environmental mechanisms are that trigger language arrangement. So such as in the work that was undertaken by Linda Bronham, Felicity Meekins and all. So hopefully you know this research this was supposed to be kind of a an exploratory piece of work but it's one that hopefully would be helpful for highlighting patterns of enagement and regions that we might be placing paying a little bit closer attention to. So these are the references that I have utilized for the presentation and I'd so like to thank Dr. Gary Holson at the University of Hawaii at Manoa for highlighting the use of this publicly available dataset from LCAT which is featured on the native languages platform. So terms of use over here follow the creative commons attribution 4.0 imported license so it allows for data to be shared indebted. So this speaks to the importance of perhaps some creative commons type of work out there so that's how you could do this. So this work was supported by Gran and then the methodology was mostly provided by Dr. Cynthia Sue she's written all the quotes for this and then I have to thank my RA Nadine who did a lot of behaving listing as well our work is currently undergoing review and revisions so that's where it's at. So any issues errors clearly our own and that brings us to the end of today's talk. So thank you. That's great thank you very much Nala. I've got some time for discussion questions. If you have a question or a comment you want to bring up you can use the raise hand function if you want to be able to just unmute yourself and ask your question otherwise you can also put a question in the chat. If you write it out I'll read it out for you or you can just note in the chat that you'd like to ask a question and I'll call on you to ask a question. Well well Jonathan's ready. Okay Jonathan go ahead with the first question. Yeah I thought I'd get mine out of the way while people think of more interesting questions. So Nala excellent paper and this looks like a really really useful tool I can imagine a bunch of applications for it not least the fact that I mean it sounds like it could be quite dynamic as well as you constantly plug new data into it the model constantly expands and develops which seems to be incredibly useful. I was trying to get my head around a lot of what you said as you said it so I suppose I was I was thinking for a little while I was musing on one of those findings where you said that languages that are found closer to the periphery and in more linguistically diverse regions tend to be associated with worse outcomes which seems to be to be intuitive. So I mean are you saying then that you know traditionally we attribute contact to a lot of this right but actually that might not be the case here is that what you're suggesting have I understood that correctly? Could you say that again the last bit of it? So traditionally we dissociate contact right with a lot of the problems that you're trying to compute here. So when you're saying that languages that are found closer to the periphery and in more linguistically diverse regions tend to be associated with worse outcomes does that mean that you know contact is much less important than we might have traditionally assumed in terms of endangerment or have I misunderstood what you're saying there? There's natural attrition but there's also a question of why are these classes why are these languages much smaller ones what's happening to these speakers are there people who are moving out the villages and then coming in contact with other languages as well so those but that's a great question so what's the rule and I've never thought about it but it's something that everyone that people can talk about so what's the rule of contact in these highly isolated languages that's a great question and I don't have a clear answer. That's fair enough I suppose it was a bit unfair of me to ask because obviously when you're saying periphery you're saying periphery of the network and therefore more isolated and therefore less contact right? Yes yes we are assuming little contact in that region itself but again like we don't know what's happening to those speakers whether perhaps if there's an outflowing of people to more central regions then if there's no work there and people are going out for actually going on so forth then that would be the issue of contact again. Yeah that makes sense actually I suppose I misunderstood you because if I was sort of thinking of the classic social network model right and you would assume periphery means broader connections outside of the core right so they would not be norming for each other. Yeah no slightly different from that. Yeah yeah yeah no fair enough okay no fine thanks thanks very much. But I think it's a great question so what's the real contact in these more isolated regions we don't have a great answer to I don't have a great answer to that right now but something I want to think of a lot more of yeah. Yeah great no thanks so much now. No worries. Another question from Turab. Thank you very much for the wonderful presentation. I'm Turab from Pakistan. I have just a comment as you said that languages which are located in remote regions are more in danger but the case is like it's other way around in the context of Pakistan like if we see the most diverse region linguistically diverse region is Chattel Valley and the languages which are located in remote regions are comparatively like safer than the languages which are located in the main regions although they are minority languages but still they are comparatively safer if we see the case of Punjabi. Punjabi is a major language one of the major language and the strength of Punjabi speakers is like is on the top in Pakistan but still Punjabi is more in danger than than the languages which are located in remote areas so what's your comment about this. Okay so I guess like it also depends on the what kind of data I feed into the system so I don't think Punjabi was featured as a language that was fed into the system so it's not a language that was featured on the Anangian language catalogue so I guess there's the question of relative endangerment and then those that so over here we have not dealt with the languages that we deem were a lot more viable than some of these other languages here so the question is what happens when you feed in all of the world's languages so over here we've got almost half of it but if you fed in other languages that are deemed to be less at risk then what happens maybe it patterns a little bit differently yeah but this was the kind of a limit of the data that said that we had to play with but I think it's a it's a great comment and I would go look at what's happening in those regions myself. Thank you thank you. I have a somewhat related question about the the data and methodology I think your data then represents all languages as points on a map is that correct so obviously that's for practical reasons necessary but in reality you know languages aren't spoken at a single point on the map and presumably there's some correlation with languages with more speakers and therefore like a lower endangerment index are going to be spoken over larger areas as well. I wonder if there's any chance that this way of conceptualizing language plays into how your analysis works out? Well I can tell you about a problem that we had was that exactly as you said sometimes it's not as though it's only spoken at one particular point and there might be the same language spoken in two different countries for example so there was a lot of finagling with the data we had to do a major data cleanup and we decided to treat it as a separate point if there were two coordinates for example provided for one language we treated it as being we treated it as being represented twice you know network because these were different points and sometimes these came with different endangerment scores something they came with the same endangerment scores but we represented them twice but it's a very good question especially for languages that it doesn't directly answer your question because I don't have a good grasp or I'm not the one who wrote the quotes but that's a very good question what happens if the language is a lot more spread up than and versus the language that's less spread up what happens in that case? So presumably though when you're double waiting other factors in the model it might mitigate for that I'm guessing but I don't know I didn't write the code either. Hoping yeah that's what it does but no this is a great one so let me bring that to my colleague to think about. I mean it is just the general problem of you know the granularity of the data and ideally we would have some kind of speech communities within language groups that we're actually measuring because there is no sort of uniform language engagement especially across you know larger group like one village in one city might be completely different from another but of course again that data she doesn't exist. Yeah there's so that really speaks to you know the importance of carrying out very good documentation that's not just about the not just the language itself but about the language yeah so so I wish you know we had a data set that would be great to kind of explore. Yeah excuse me we have a question from Kane Edmond in the chat he says thanks for the great talk do you think that the European environment would make the picture significantly different in particular considering the high intensity of globalism in all fields of life. I think localization seems to be something that is relevant for a lot more communities out there than perhaps just the European picture so for example I'm paying a lot for petrol right now so and I'm so glad to be driving a hybrid so I guess globalization does affect me even though I'm not in that picture but yes I mean I would go look at it I can't answer it directly because I do not particularly think that it's much more globalized than in other areas because the globalized vision is happening at an unprecedented rate in other areas as well as in South Asia and so and so forth like we are all interconnected and affected by what's going on in the world clearly so with this pandemic and the social situation that's going on so but I could go look at the data and figure out if it says anything different about that the European picture but a good question yeah. Any other questions or comments? I'm going to ask you one more question but more of the sort of speculating about the meaning of your results and so I recognize you might not have an answer but it's about that particularly interesting cluster that includes Philippines, West Indonesia and Australia intuitively that seems coincidental I mean what does Australia have to do with Indonesia except geographic proximity and you know I think you've specifically done this in a way that you know puts the geographic proximity above everything else as a sort of an interesting way of looking at the data and so this is obviously the design is not really a flaw in the method but I'm wondering how do you how do you decide whether this cluster isn't just coincidental or is there some explanation for this cluster that you know what's meaningful about this cluster that brings together places as different as Australia and Indonesia that really may seem to only have their geographic proximity in common. Yeah so we're assuming over here that geographic proximity means that they experience similar kinds of reality but that's a good question so this is actually one for Cynthia and I wish you were here but we tested the model at different confidence values at different scores and our model of 13 communities kind of ended up being very good predictors for everything else so for example when we threw linguistic diversity in this community of 13 different groups would still account for for what was happening in terms of predicting engagement statuses but when you decrease or increase that the ability of this model to predict those things fall as it were so when she's run a lot more tests with it but you're right it's how we've kind of conceived this whole plan so it's kind of letting the data quantify speak for itself but of course we know that these things have to be weighed with what we know about these regions for example yeah. Yeah okay so this is sort of one method of saying you know how much can we learn just from geographic proximity and of course there's other lenses through looking at language endangerment to try to understand a complex situation as well. Yeah so this is just the kind of the data set that we had to work with but of course we're trying to do other things with data set right now but this was the first piece of thing that we could push out from the data set. Are there any other questions a final question or comment? Good well Nala thank you so much for being here to present with us today really appreciate seeing this research and these kind of innovative pushes towards using the data that's available even if it's not always the best data or perfect data sets to try to understand more about what's happening with the nature of languages around the world so thank you for putting this together and we look forward to seeing the final paper when it's out. Okay thanks for having me today yeah take care guys. Thanks everyone for coming.