 Okay. Well, thank you everyone who's able to join us today. Our speaker for today's webinar is Dr. Hadas Kotec, who's a syntactician, semantist, and experimental linguist, who has an affiliated research position at MIT where she also received her PhD in linguistics. Since then, Hadas has worked as a postdoctoral fellow at McGill University and also held teaching positions at NYU and Yale. Her presentation today is on an ongoing collaborative project, working with many other people. I think she'll mention in her presentation, so I won't try to list all the names. On the topic of gender representation in constructed example sentences. So welcome, Hadas. Thank you for taking the time to join us and share your work with us. Look forward to hearing what you have to share. As mentioned, we'll keep our microphones and cameras off. We'll be listening intently even though you may not seem like we're here during your presentation. And then we'll save the last 10 or 15 minutes of this hour together for questions and responses. You can ask your question by writing it in the chat, or by raising your hand and then I'll at the end ask you to unmute and turn off your camera so you can ask your question. So with that, let me hand it over to Hadas and looking forward to what you have to share. Alright, thank you for this introduction and thank you for inviting me. I should also point out that some of my co-authors are also present here on this call. And so if you ask questions in the chat, you might actually get quite quick answers to your questions. And I may refer some of the difficult questions specifically to them, so we shall see. Let me share my screen and apologize in advance for noises in the background. Those would be my cats going crazy. You may have already seen one of them. Okay. So hopefully I'm sharing my screen. Looks good. And hopefully I'm fine with full screen mode. It doesn't get stuck, but tell me if it is. Can you see me go to the next page? Okay. So, alright, so I'm going to get started. What I'm going to be reporting on today is a project that has inspired three papers that are related to one another and has been done in collaboration with two separate groups of colleagues co-authors. The first part, which I will actually go over kind of briefly because I want to spend more time on the second half. But this part is done with a group of committee members from the committee, the LSA committee on the ungender equity and linguistics. This is a new name change we used to be called Cost World, the committee on the status of women in linguistics. And this is a picture of us presenting from a few years ago back when you could travel to conferences and present. But that earlier work inspired the work that will be the main part of what I will talk about today. And that part is a collaboration with three Yale graduate students or former graduate students by now. I'll say more about precisely how we build on and expand that. And all three of them, Ricker, Sierra and Chris are on this call today, so which I'm very happy to see. And hopefully they can help me and help you make sense of everything that I'm saying. So let me start by giving a bit of a broader background for this talk and this topic and why we're interested in it. So in 1996, the Linguistic Society of America publishes what it calls the guidelines for non-sexist usage. And this is its first attempt to write down and give guidelines for how to be inclusive with respect to gender in writing that relates to linguistics. And around the same time in 1997, Macaulay and Bryce publish a paper and language that is an analysis of 11 syntax textbooks that were published between 69 and 94, so quite a range. And they conclude that the majority of constructed example sentences in syntax textbooks are biased toward male gendered noun phrases and contained highly stereotyped representations of both genders. And then 20 years later, together with my colleagues from Coggle, we conduct a similar study and report on similar results from six syntax textbooks that were published more recently, so between 2005 and 2017. We'll see more about both of these studies in a bit. Before I move on though, I want to say right now that we do recognize that gender is not a binary and that people who identify outside of the gender binary may or may not even adopt gendered language to refer to themselves. And furthermore that this has nothing to do with the sex that they were assigned at birth. You will see us talking about gender as if it is a binary in this talk and that is a limitation of our study. We've spent a lot of time thinking about this and I'm happy to say more about this toward the end of the talk in the Q&A. We spent a lot of time trying to think about how to do better, but I want to flag that just because we say male and female and kind of ignore additional complexities does not mean that we don't recognize that those both exist and are really important. Okay. So what I want to spend the majority of my time today on is a study of gender representation in example sentences in journal papers that were published between 1997 and 2018 in three leading theoretical linguistics journals. And those are language, linguistic theory, inquiry and natural language and linguistic theory. And specifically the question we wanted to address was whether the bias that has been observed in syntax textbooks extends beyond this limited genre and into research, into scholarly work in linguistics broadly. And we will show you that the answer is yes. And once we get that far, then we want to know what can we do about that? How can we improve? So here's my plan for today. I'll spend a bit of time. We've done the introduction. I'll spend a bit of time talking about gender in gender representation in textbooks covering both the Macaulay and Bryce paper and my collaboration with subcommittee, subpart of the Kaggle committee. And I spent the majority of my time talking about gender representation in journal papers and then discuss both why this matters and how we can try to improve on the status, the situation that we will show you. So, all right, let's dive into textbook example sentences. Okay, so Macaulay and Bryce, 1997 is a comparative study of constructed example sentences from 11 syntax textbooks that were published between 69 and 94. Where the first part was a close investigation of one specific textbook and one, some skews in the representation of gender were found in that one study. The study that we're really interested in is this expansion, this additional study that looked at 10 additional textbooks and tried to investigate whether this is something general or maybe just happened to be that they picked one book that had issues, but this wasn't a general issue. So, all right, so for this more larger study that looked at 10 syntax textbooks, what Macaulay and Bryce did was sample 200 examples from each of these textbooks and find, identify all of the noun phrases in those examples and then code them, assign them labels for a variety of factors of interest. And so, the first one is grammatical gender, which could be female or male. Grammatical functions, so something can be a subject, it can be a direct object, it can be an indirect object, and maybe there are a few other more minor types of functions that one could imagine, but these are the main ones. So these are kind of semantic relations, so is the noun phrase an agent of an action, is it a patient, so having an action kind of acted upon it, is it an experiencer, so is it a kind of experiencing something but nothing really is changing in the world, is it a recipient, is it a goal, there are a few other, but these are the big ones. And then another factor of interest is just general lexical choices in the example, so is there a pronoun, or is there a proper name, or does the sentence describe violence, or does it describe someone's appearance, or is it about reading and writing, and a few other of these kinds of choices. Okay, so I'm going to kind of, in the interest of time just kind of tell you what the results are and kind of not show you numbers for any of these, at least in this part of the talk. So here are the main findings from Macaulay and Brice. So they identify that men occur more often as arguments generally than women do. They are more likely to be subjects and agents than women. They have proper names and pronouns more often than women. They engage in intellectual activity, so that basically means they are the ones who read and write books, and they appear in car-related events more often than women do. They are described as having occupations more often than women, and they have a broader range of occupations, so they just, they do more things, and they engage in violence more often than women. On the other hand, women are referred to using kinship terms more often than men are, so they are someone's wife or mother. Those are the two most common ones. And they have their appearances described more often than men do. So here are a few examples of what we mean by that. I'm not going to read all of these, but I will read a few of them. So every painting of Maya and photograph of Debbie pleased Ben. Harry watches the fights and his wife, the soap operas. Bill is proud of his father and tired of his mother. John might drown some selection of arguments that vary in some ways. I will not read all of them. Stephen likes that Maya hates the man next door. We consider him to be a genius and her to be a fool. I'm going to just kind of highlight, let you look at the other ones. I'm going to take maybe 15 seconds to drink my coffee over here. And as an exercise to the reader, pick one and try to imagine what syntactic property it shows that this needs to show, right? Another example could be show or just, yep, I will pause for a few seconds. Okay, I have more examples to show you. In addition to these, you know, stereotypical examples, maybe we also find or Macaulay and Bryce also find explicit and suggestive language in examples. So again, I definitely will not read all of these. Some of these are, I don't want to read them. But here's here are a few. So what a nice pair Mary's got where pair is a wordplay on pair PAIR. John forced Mary to be kissed by Bill. He once glanked and out of work actress, not reading most of these. She's fond of John naked. This is my favorite. I read this last one, but again, as an exercise to everyone on this call. What is this showing? Why is this here? There is an answer. But once we have that, we could possibly come up with a better example for why this is here. And in particular, it's not completely obvious that the explicit language here is necessary. So kind of summing up Macaulay and Bryce comments are results clearly illustrate the need for such scrutiny. So they're trying to explain why they're conducting the study because they have been getting pushback on is there is this real and do we need this right. So they say females are simply not significant actors in the world constructed by sample sentences. And we want to point out that neither are non binary individuals. They're not even discussed. But that was almost 25 years ago. And so one wonders, is that still really the case now. So to foreshadow the answer is going to be yes. And to kind of show you how the answer is still yes in the context of textbooks. I'm going to again briefly talk about my study with the other committee members of cargo, the committee on gender equity and linguistics. So we looked at six, six, six syntax textbooks that were published between 2005 and 2017. And we use the same factors of interest as Macaulay and Bryce in our study. And again, just showing you major findings and skipping graphs, but you can ask me to show you graphs if you're curious. We found a total of 1200 or so gendered arguments and of those we had a ratio of two to one male to female arguments. So 34% of our arguments were female and the rest were male. So this is, I think, a significant skew in total numbers. And we furthermore find that it is consistent across all of the books that we looked at and it is consistent, regardless of the language of example. So this is something we were interested in that Macaulay and Bryce did not control for. So we were curious about whether the skew is contributed by data from other languages. So in particular, if the author is reliant on someone else's research and someone else's research is biased, maybe that's why they're forced in the sense to pick biased arguments or biased sentences. And we find that that's probably not, at least not the full story because this is true in English. It is true in French. It is true in German. It is true in regardless of what language we were looking at. Okay. So a quick summary again. So men still appear more often as arguments than women do. They're still more likely to be subjects and agents. They still engage in intellectual activities like book reading or writing or handling more often. They are still described as having occupations more often. And they still have a broader range of occupations. And they engage in more violence than women. And when you look at the actual predicates, you discovered that the violence is more severe when men engage in it than when women engage in it. Okay. Some things have improved. So that's what it's saying. Some ratios did improve between the earlier study, Macaulay and Brice, and this work. One thing that's important, really important, I think, to point out, and we can talk about later is that the actual explicit content is basically gone. We could find in our entire study three such examples. So basically almost none. We don't see discussions of women's appearances. We don't see discussions of women pleasing men. Again, sexually explicit, suggestive language, all of those things just are not there. We also don't see examples concerning men and cars. That's absent too. But all of the major findings from Macaulay and Brice are still the same. We also see this big skew in the choices that authors make about who is a subject or another subject, who is an agent, an experiencer, or a patient. So having an action acted on them, who is described as being a genius, who is described as reading books, who is described as being a professor or a teacher or, you know, a CEO or a secretary and so on. In conclusion, we tried to find non-gendered or gender neutral names in particular. We were interested in that. So looking at names, we find that in fact very few are gender neutral. So there isn't an apparent effort to make examples non-biased, at least in this way. And again, explicit discussions of non-binary gender identities are just absent from these textbooks that we were looking at. Okay. So this is a lot of background on the stage of example sentences in textbooks. I want to move on to talking about journal papers. And this is where I'll spend more time showing you breakdowns of numbers and I'll show you graphs. Again, I guess we're not taking questions during this talk, but feel free to kind of comment in the chat or ask questions if anything is not clear. So to put this in a broader context, textbooks are a very specific genre. And so we want to know if this is indicative or illustrative of linguistic research more generally. And so to answer that question, what we did was take three journals in theoretical linguistics. And those are linguistic inquiry, natural language and linguistic theory and language and extract all of the papers from those three journals that were published between the years 97 and 2018. So 97 being the year that Macaulay and Bryce was published and 2018 being the year that we started doing the study. In total, this gave us 927 papers and about 25,000 third person human or animate arguments. And so hopefully this is a quite a large data set and should allow us to be quite confident about any conclusions that we draw. To say a bit more, so now it doesn't make sense anymore to to we can't sample again. And so we can't we can't use the same kind of methodology that the previous work did since their numbers were much smaller. So instead of what we did was we extracted examples from papers using regular expressions. So these are kind of formulas you can write to identify patterns and text. So example sentences are convenient in that they all look the same. So, you know, there, there's going to be a parenthesis and then some numbers and then another parentheses and then it's going to the text is going to be removed from the margin. So we can write kind of a rule to try to identify all of those examples. This does mean that we're not looking at any example sentences that are just in the text we can we just there's no good way to find those we're not looking at those. But again, it gives us 25,000 examples. So hopefully that's a large enough sample to be quite confident about. We're interested generally in the same properties as these two previous papers. There are a couple others I will show you later that we did a little bit differently. And now, instead of doing all of this work ourselves, we hired a small army of VL undergraduate students to to do this work for us. So we taught them how to identify these properties of interest. Right. So is an argument a subject is it a recipient is there violence here so we've defined the factors of interest. And we had them do the work with one of us, at least one of us making sure doing quality control making sure that things look good. We also did some things automatically so not with humans assigning labels. So for example, for looking at emotions, we could do sentiment analysis. And in some cases, again, we could use regular expressions. So for example, for kinship terms, the list of kinship terms is not very large. We could just try to construct a list and identify all of those tokens. Okay, so let me now go into a section of the talk where I show you lots of graphs and numbers. And here's the first one. So this is just the overall distributions of subjects and objects. So all arguments, I should say in in the paper stuff we saw. So we find so we're kind of identifying here, female arguments, male arguments and ambiguous or non gendered arguments. So those could be things like the student. When there isn't a pronoun or some other way to or gender agreement or some other way to identify that it's gendered. It can be a name that could be ambiguous like Taylor, for example. And so we find that of the gendered arguments, which the majority of the talk will just look at those will compare the female arguments and the male arguments. There is a two to one a bit more than two to one male to female ratio. So for every one female argument in example sentences, there are two more than two male arguments. And furthermore, we can also look at how this has changed over time. And so here is one way to look at that. So we're looking at over time on the way x axis and the ratio of female to male arguments on the y axis. So what you're seeing here is a trend toward a slight improvement. And if you kind of squint at the actual graph, you see that it's an improvement from roughly 0.3 to roughly 0.33 or maybe 34. So not a huge increase, but an increase. But once you break out to this increase into subject and non subject arguments separately. In fact, what you find is that there has been a slight decrease in the number of female subjects over time in example sentences. And an increase in non subject arguments that are female over time. So that slide trend toward an increase that we were seeing on the previous slide really is contributed by an increase in the numbers of non or proportions of non or non subject arguments. Which I don't know is a bit discouraging. So next I want to show you compare examples in English and non English. And so the way we're showing this is using this kind of mosaic plot. So if you've never seen one of these before on the x axis, we have our factor of interest, which here is English or not English. On the y axis, we have our two categories before comparing so male and female. And the width of the bars shows you the numbers, how many it corresponds to how many total examples we find. So the non English bar is about twice as wide as the English bar. The English bar to this indicates that the numbers in the non English pool is about twice the numbers are about twice as much as as high as the English side. And then the last thing that you can kind of squint at and see here is the proportions but since proportions are you need to compute those we just give you those on the side. So in English examples, we find that there are 33% female arguments and the rest are male. And in non English examples, we find 30% female arguments. So these are similar proportions, we think there is statistically the same. Okay, we can also break this down by journal and try to compare across journals and identify trends. And really what we find here is that the numbers we find are very consistent across the three journals. So some of the counts are different. So you can see that in natural language and linguistic theory on the right. There are generally more examples than or more examples with the human gendered arguments than in linguistic inquiry or than in language. But nonetheless the proportions that we find are strikingly similar. So 32 or 31% female arguments and the rest being male. And this is this is convenient in that it's going to allow us to collapse our data and show you numbers that consist of data from all three journals since they exhibit the same behavior. So throughout for the rest of this talk, we will show you data that collapses information from across the three journals. Okay, so we can talk about grammatical function next. So we can compare subjects and non subjects or objects with respect to whether they are male or female. And so I think it's maybe most convenient to just look at the proportions at the bottom there. What we find is that 83% of male arguments are subjects and 79% of female arguments are subjects. So a bit fewer female arguments or female subjects than male subjects. Not a huge difference, but a bit of a difference. We can also look at thematic roles. So if you're not familiar with those, I know that fewer people are familiar with those than with the concept of subjects and objects. Those correspond to semantic roles. I think that's that's a good way of putting it. So it's trying to describe what the argument is doing in the sentence. So is it being active and initiating an action or is it experiencing an action or experiencing an emotion, for example? So if you fear something, the world isn't changing, but something internally inside of you is changing. If you're a patient, then you're being acted upon. If you're a recipient, then you're receiving something. So we can see that there are more agents than other roles. This kind of makes sense. So agents and experiencers normally will be subjects. This correspondence isn't perfect, but it is at least consistent. So more agents than other roles, but you can see that the agents and the experiencers, so the roles that correspond most often to the grammatical role of subject, those have 30 percent female arguments in the rest of the male. When you look at patients, you see an increase in female arguments. So now 35 percent and recipient all the way on the right. You can see by the fact that the bar itself is really thin that the numbers are quite small. And I apologize for maybe the number being slightly hard to read. But the proportion, which you can see on the right there, is now 42 percent female. So again, an increase in the proportion of female arguments in these less active and these acted upon type of roles. Let's talk about some lexical choices in our example sentences. So on the left, we can ask whether there is a difference in proportions of names. Do men or women have more names than the other gender we're looking at? And the answer here is no. So you can see on the left that generally there are quite a lot of names in example sentences. But in terms of proportions, they are the same across male and female. So 59 percent of men, 58 percent of women have proper names in example sentences. On the right, we ask about pronouns. So this graph is breaking out pronouns from non-pronouns. And what we're seeing is that male arguments have pronouns associated with them 29 percent of the time. Women or female arguments have pronouns associated with them 23 percent of the time. We think this is generally probably attributable to the fact that there are more male subjects than female subjects and likely a pronoun if it's going to exist is going to be a subject more often than a non-subject. So this we think is just a byproduct of the general tendency to have more subjects that are male in example sentences. More about names. So we can look at the top most common names in example sentences. So we have the top most frequent male names on the left and the top most frequent female names on the right. And you can notice that they are not diverse. So in the top five male names, we have John and Juan, but really John. John is dominating everything else. On the right side for female argument names, we find Mary, Maria and Marie. So again, choices that are fairly limited. Of the over 10,000 names that we identified in the study 428 were classified as non-gendered or ambiguously gendered. So this is just 4 percent of the data. We don't feel confident saying very much about numbers that are that small compared to the totals. But I think it's interesting that these are the choices that we're making. More on specific lexical choices. So we can look at examples that describe occupation. So, you know, professor, teacher, banker, secretary and so on. And here we find that men are overrepresented in these examples, even given the total 2 to 1 ratio that we find in the overall sample. So the overall sample would suggest we should find 66 or 67 percent female male arguments here. We find 74 percent male arguments here. Looking at violence, males are now significantly overrepresented in these examples. So 84 percent of all arguments in violence-related examples are men. And now we're interested in breaking those out according to whether the argument is a subject. So probably inflicting the violence or a non-subject. So the being acted upon objects or argument in the sentence. And what we find is that women are subjects of violence-related sentences, 68 percent of the time. Men are subjects 72 percent of the time. So again, a slight skew towards males being subjects more often. Okay, next slide we're going to show you data where women are going to be overrepresented. So the first one we're interested in is what we call romance-related examples. So, you know, kissing, hugging, liking, loving, those kinds of things. And here we find that only 50 percent of all of our sample is male arguments. And given the two-to-one skew that we generally find, this suggests that women are overrepresented in these examples. We should not find just 50 percent, we should find 67 percent male arguments if there was no skew. Okay, and then it becomes even more interesting when we break things out by subject or non-subject. We find that women are subjects of these actions, of these sentences, 58 percent of the time. Men are subjects 76 percent of the time, right? So in addition to this skew of just having more women generally in these examples, it's even more likely that they will be acted upon, non-subjects than otherwise. All right, this is the part of the talk where I start to lose my voice. So I'm going to mute myself and cough once in a while and apologize for that. All right, so last thing about lexical choices. Well, one more thing to do with lexical choices that I want to show you has to do with kinship terms. Now we find that women are massively overrepresented in these examples. So these are only 44 percent male. And again, just to remind you, we would expect 67 percent male if there was no skew. So I think this is quite striking. So when women are going to be chosen more often than otherwise in the sample, it's most likely for them that they will be someone's mother, someone's wife, someone's daughter, and so on. Okay, next thing I want to show you is sentiment analysis. So this is where we go into methods that are not just counting, but do something more, a bit of an analysis. And in particular here, what we're doing is using two existing packages, one in the slide and one in the next, that automatically categorize predicates, verbs into types of emotion, and then we can do the same kinds of counts on these sentences as we were doing before. So specifically the Bing method is just going to categorize emotion into positive or negative. So in general, a verb or a predicate in a sentence could either be positive or it could be negative, or it could be neither in which case it's going to be discarded from this analysis. So this is why these numbers look much lower than the overall numbers that we were looking at throughout the rest of this talk. But generally what we find is for predicates that were categorized as either positive or negative, there is a skew towards being more negative for male arguments, and a skew toward being more positive for female arguments. So 2.221's general skew, 2.521 for negative emotions, 1.721 for positive emotions. Okay, the NRC method categorizes predicates into more fine-grained categories. So those are listed over on the Y axis in this graph. So we get anger, fear, negative, general negative, sadness, disgust, surprise, positive, or general positive, trust, anticipation, and joy. So these are the categories of interest. And again, if a predicate doesn't fall into any of these, then it's just excluded from this particular study or from this particular analysis. The black bar there shows you the 2.221 ratio. So anything to the right of that skews male beyond the general skew that we find, and anything to the left of that is skews female beyond the general skew that we find. And so, again, what we find is fairly consistent, right? So all of the negative emotions seem to be skewing male to some degree. Surprise is kind of not skewed, and anything else is skewed female. Okay, so this is the part of the day where I read a few examples. We've selected quite a few. It's hard to pick just a few. But I do want to point out, as they read these, that our goal is never to just point out just a few or to make fun of anyone in particular or to call out any author. What we're showing you is also not cherry-picked, whereas it's going to be an illustration of something that is very general. Okay, so here are some examples. All right, so which Nobel Prize-winning author came in his car? At least one student of every professor is horrified at his grading policy. No linguist here recommended some of his own books, but I don't know which of his own books. An example of some complex sluicing, ellipsis-type example, but suggesting that linguists are men. Mary, being dumb, needs to sit down. Ray's mother thinks he is a genius. Olyama's sister-in-law needed a scarf. There's more, so I'm going to show you kind of another bunch of examples on the next slide, just to kind of drive home the point that this is quite a general thing. So John ate the meal and Mary cleaned the dishes. John didn't eat the meal because he would have had to clean the dishes. Those are from the same paper. John thinks that he himself is a war hero. John told Bill that Mary began to cry without any reason. Kelly broke again tonight when she did the dishes. For whom do you regret that she made a cake? Yeah. Again, I think I'm going to do want to point out that what we're seeing is stereotypes, not just of women, but of men as well. So this is a property that doesn't single out just one gender for this treatment of picking stereotypes. It's quite general. So I'm kind of summing up and showing you a slide that looks kind of like what we saw before, because what we saw before is kind of still true today. So men appear more often as arguments in these example sentences. They appear more often as subjects and agents and experiencers. They engage in significantly more violence. They have significantly more occupations. And they exhibit more negative emotions. Whereas women are overrepresented as recipients and patients. They are overrepresented in romantic examples. They are massively overrepresented or over referred to using kinship terms. And they seem to exhibit more positive emotions. Some things are interesting, maybe are similar to what we find in the more recent textbook study than the original Macaulay and Brice paper. So not many suggestive or explicit language and examples, although we do absolutely find stereotypes in inner examples. Again, the language of example didn't make a difference. So this is not just an effect of not having access to better resources. These are very general choices. And we're seeing a slight increase over time in the numbers of our proportions of female arguments. But it seems like those are caused by an increase in non-subject female arguments. As instead of being a more general just increase in across the board number of female arguments. We think we can do better. We hope we can do better. So this is what we want to spend the rest of the time today on. Okay, so one thing to say is explicit discussions of non-binary gender identities are just entirely absent. There are other things that we could have discussed that we didn't. And you can ask me about, we have thoughts about all of these. For example, the use of non-Western names, how this might compare to corpus examples. So not constructed, but rather, you know, naturally occurring ones. What we think about explicit elicited examples and fuel work narratives. There is work that goes beyond what we've done already. But okay, I'm in the interest of time. Some of it I will do in the usual speed and some of it I may choose to skip and we can go back to if there is interest. Okay, but I don't think this matters, right? So constructed example sentences are one of the main sources of data in theoretical linguistics. And these examples that we see in these papers are cited over and over again. They're often divorced from the original source that they were given in. Just treated as an example from the literature of some phenomenon. And we see that they encode biases, sometimes subtle, but certainly existing ones. And those get handed down to new generations of linguists and that perpetuates the cycle. In the interest of stating what I hope is obvious, inclusive language encourages participation from underrepresented groups. And that leads to a better community and that leads to better science at the cost of not very much effort. It's just a little bit of thoughtfulness. In particular, one thing we can do is go beyond the familiar names. So John, Mary, Bill and Sue think beyond the first names that come to mind when you cite someone, when you invite someone to an event. Generally, these small actions can go a long way and I do want to pause and say, this is hard. So even for myself, having thought about this for at least five years, just being engaged in this study and the other one, John still is the first name that comes to mind for me. It's very ingrained, but I've trained myself to just pause and take a breath and change the first thing that comes or move to the second thing, the next thing that comes to mind. It is a bit of an effort, but I think it's worth it. So this is kind of a more American centric side, but the Linguistic Society of America has tried to address some of this. There are some resources that are relevant. So in 2026, there was the first attempt at the guidelines for noncessive usage. In 2016, there was an effort to revamp these and change them. We're now working with the guidelines for inclusive language, which you can find on the other site website. We had a panel that specifically addressed all of these issues. And just a month ago, Cargo, my committee I'm on, publishes these list, these set of resources on equity and inclusivity and linguistics. If we get the time in the question period, I would really actually love to just show you what this looks like. But I do want to just point out that there are resources out there to help you. I want to discuss a couple of objections explicitly here now, since they come up on occasion. So the first one is we hear sometimes, you know, what you're saying is threatens my free speech, or constraints my creativity, or feels like censorship. And what we want to say to that is, well, maybe, but if an example could potentially hurt someone and the content is just not relevant to illustrating a certain phenomenon that you're trying to illustrate, then you or linguists can and should find other means to illustrate the linguistic points that they're trying to make. Second possible objection is, you know, maybe you've convinced me that there's a problem, but it's really not clear what we can do about it. And yes, that's true. It's hard and it's probably multifaceted and complex. But we would welcome any effort to reverse the skews that we're seeing and to present linguistic examples in a way that celebrates and honors the diversity of individuals representing our field. So really, I'll give some specific examples of what you can do, but any small action is going to be a search toward step toward improving this. So in the interest of being clear, so for one, stereotypical language and sexually explicit and demeaning language, language that reflects biases can be avoided and should be avoided. The use of gendered lexical items, so like congressman, where he as a so-called, you know, inclusive or non-gendered pronoun, that's unnecessary and could be avoided. And the biased and elevated frequency of particular gendered NPs and particular syntactic positions and semantic roles should be diminished. So this is a way of saying there is no particular reason why men need to be subjects more often and women need to be non-subjects more often. So this is something that we can fix. Embrace singular they. We do hear sometimes that the male pronoun is said to be the correct choice for singular nouns whose gender is not known. But even if technically, you know, you think this is correct, it feels exclusionary of anyone who's not a man. Ask the non-man in your life if you're not sure. I can tell you from my experience that I don't feel particularly included by the he pronoun. And then on the other hand, good news is that singular they has been around for decades precisely for this purpose and we should use it. This is the part that I will just kind of say and I will skip the next two slides. We do recognize that this is not special to linguistics, so we didn't invent this particular type of skew in this particular way of thinking. It is a very general societal trend. It starts very early. It is very entrenched. But just because that is the case doesn't mean that we have to repeat it in our scientific field of study. We can in fact choose to not do that. So I'm going to skip these few slides. You can ask me about them, but they just show particular studies that illustrate just how endemic and entrenched it is. Okay, so this is my last slide. So what can you do specifically? Well, if you're an instructor, you can choose your examples carefully. You can be sensitive to how you portray all individuals in your examples and you can keep in mind that you're in a position of authority and you can have a positive influence on young minds that are entering the field. You can also think about gender ratios and representations in your syllabi. So who do you teach basically? If you're an author, you can be thorough and inclusive and balanced in your citations. You can choose not to perpetuate bias in your examples that you construct and that you cite. You can keep the guidelines for inclusive language in mind when you write. If you're an editor or a writer, you can pay attention to the examples in the language that authors use and you can comment on that in your reviews or in your decisions. If you're a conference organizer, I hope that you will check out the real guidebook too, which is again resources for equity and inclusivity and linguistics that give very specific ideas for how you can be more inclusive when you organize a conference. And with that, I will thank the audience and this big list of individuals who've helped us over the years and I will be happy to take your questions. Thank you so much. That was very clear and really interesting patterns, but to see what has changed and of course what hasn't changed. I think that the emphasis on the immediate impact on this on people who are learning about linguistics is this can be very contentious and politicized issues that people can of course have strong feelings about, but I think it's a good way to build consensus and motivations just thinking about those people who are trying to enter into the field and what the more immediate impact is on them. I've got several questions about methodology in the chat already and some responses from your co-authors. So I'll read off some of these questions and if it's your question and you would like to read it off or articulate it for yourself, feel free to raise your hand or restart your video or microphone, let me know. So there's a question from Annie who was wondering if Massajid Noir has been investigated in this study and there's a response from Chris on generally the difficulties of looking at race in any way and that's kind of a study because it's not, you know, a grammatical feature in any languages. Do you have any comments on issues of race that intersect with some of the issues that you've addressed in this presentation? So the main thing to say about that is that the numbers of arguments that we can identify as being racially non-whites are small, very small. So really the only way we can identify race is through names, I think, and because it's not a grammatical feature and you don't see agreement, for example. And again, you saw the slide with names, the choices of names are just very Eurocentric and white sounding. So while I agree, this is very interesting, it's just hard to see how this particular sample can help us shed light on that question. Then there's a question from Peter Austin asking about the correlation between the gender of the authors and the distribution of gender in their example. So Chris, guys who did say that that's something you've looked at and could chat about. I don't know if you or Chris want to respond to that question in more detail. I would leave Hadas if you've got that. We do have a section about that in a paper of this and in other versions of this talk. I don't know if you have a pocket slide about that. I do not have a pocket slide, but I probably should have prepared one because I've been asked this before, but I did not come prepared. So you will see me now flounder for a moment as I tried to find the paper. You can fill in a stat in the meantime while you're doing that. You don't mind me to begin. It was really similar. So it was something like 30 male authors or something like 36% had 36% female arguments and female authors were sorry, 32 to 36. So female authors used very slightly more female arguments but not much more. It was very similar to the overall ratio. So I think this is the graph that we want to be looking at. Riker, tell me if I'm wrong. Your screen is not shared right now. That is a good point. That is a very good point. You're right. So let me try my screen for a moment because I forgot to stop sharing. Here's the graph. That's the one, right Riker? I think this is the one. So okay, first thing to say is we manually classified authors by gender based on based on their names, which is 100% imperfect. And I want to acknowledge that. With that, we're looking at the ratios of female arguments, male arguments and non gendered arguments across female authors and male authors. And the ratios themselves are quite similar. Riker, can you repeat what the ratios were just now that we're looking at the graph? I think what you see here. Yeah, I think it works out something like 35.7% female arguments from female authors and 31 or 32% from male authors. So not very different. Yeah, I think the main thing that you can see here that maybe is interesting is that is the comparison between the red bar, the one on the left of three and the green one, which is the one on the right. So for male authors, these are basically the same. These are, you know, choice of female arguments and choice of non gendered arguments seem to be kind of some proportion. Whereas female argument or female authors choose more female arguments, but just to some to some degree, not just to some degree put it out. That's possibly the most interesting aspect of this particular graph. I do believe that statistically the numbers were not, were not different. And the we tried looking at this in a couple of different angles in terms of how to handle co authorship and first author and single author or multiple authors and didn't seem to be substantially different as our call. Yeah, another question from Julia Salibank who brought up the issue you alluded to of using constructed examples versus corpus example so she asked, Why is there still such reliance on constructed sentences. Why don't use data from corpora of date of language and use so it'd be interesting to see to what extent these would reproduce bias. So there's a bit of a response in the chat from Chris and from Ricker, as well as a comment from Peter Austin so I don't know if any of you want to elaborate on that point. So I feel like it is advantage. I'm the only one who's coming out with the chat. I can, I can recap what I said I guess. So this is a great question. So, and of course this is this was part of our, you know, research question from the start really and so we had other journals and we're trying to figure out what we can extract and so part of it is a methodological limitation. There's just something about you know the way that we present these constructed senses which makes them easy to extract and gather a lot of data to then hire undergraduates to analyze the nature of corpus data is that it's people don't really speak in complete sentences and so that created that very difficult logistical barrier to just building a data set so super interested in the question. There's partially logistical reasons that we ultimately basically there was just so much less data that we could identify automatically that we didn't feel like we could compare the two data sets head to head in a way that we could say corpus behaves like this constructed behaves like that. And I think one other thing to add there if I remember correctly is that a lot of examples were first and second person arguments so there were just significantly fewer. There are person arguments even once you identified sentences and we're able to chunk them up into out of conversations and into what we might think of as individual sentences. There was generally just order of magnitude last data than in these other journals that we were looking at. Thanks. And another question from Mila or Mila was asking are there any existing studies on gender representation in various language teaching books. I'm afraid the results might be similar. So there is a study specifically looking at Spanish. There are some studies looking specifically at English instruction. There is a study looking at French journals. They all find the same. They all find the same kinds of findings. So yes I think this is probably quite a general. And another question from Sila or Sila asking if, you know, if the writers of the textbooks have some particular reason and continuing to construct just bias language and textbook. So what are the possible factors you may think of. I don't think we're thinking some of this is unconscious but what's the bigger picture here or do you not want to get into a social analysis of why these disparities exist. I don't think that I want to attribute motives to anyone. I do think that we could guess that some things so for one. I'm just saying this is a part of a larger societal trend apologize for the cat being loud. So this is a part of a larger societal trend. So, you know, if you were to just go with what comes to mind and you were following the kind of general things you think about about individuals and kind of standard context. You might be more inclined to think about a male argument and a female argument or something that's more stereotypical. And you might also not have realized that you're doing it. I think it's very likely that this is not I don't think this is conscious. I think I want to say that. So, and in particular this is I flagged at some point to that in the more in the newer studies we're not seeing sexually explicit and suggestive language. And I think this is important because one way to interpret that is that at some point around maybe the odds. We had identified the field has identified that there is some issue with the way example sentences are constructed. And this is this is like the glaring thing right so those examples are kind of the most egregious. And so you can go in and just fix those and remove those and once you've done that you've improved things in some way but you've also just abstract obscured the view you've made it much harder to find and identify now that there's an issue. So now you need to have them kind of study that we were doing to show you that, while any one individual example maybe is okay, where the majority are on the whole, we still have an issue. So, yeah, speculating here right we don't know why. Yeah, I mean, I would echo the idea that this is unconscious especially you just think of how much work it is to to actually write these and so we're just analyzing what somebody, you know, poured hours of study into trying to construct an example to explain a very abstract concept in a clear way. So the last thing that was on their mind was which gender pronoun did I use and so then you know these unconscious biases are going to come out just because there's so much mental energy I think being being poured elsewhere. As an answer to Peter Austin's comment in the chat. Once we started doing the study and presenting it, there has been I think increased interest in improving. So, in particular I'm aware of two authors of textbooks that went in and so one of them were at the point of they had finished a draft and it wasn't published yet, gone in and so I referred someone to an student to to do this kind of analysis and target and improve the ratios of arguments and their, and their example sentences and the second author of a published textbook that is interested in doing this for the next edition of their textbook. So, so yes hopefully this will actually change something in the field. That's great. Yeah, that does make sense. I kind of you need help to do this is kind of hard to do it on your own as an author. I'm going to leave it there with our questions are now over the hour and I see some people have had to go already. But let me say thank you again to us and to Chris and to right here who have been here helping with some questions as well and to those who have worked with you and collaborating and bringing these issues to our attention for reflection and hopefully changing some of our practices and linguistics and making it a better field for everyone. And thank you to Sarah to who's here and in the background answering questions. Thank you. All right, thank you for joining us patient. Thank you for the questions. I appreciate your time. Bye everyone. Bye everyone.