 Hello to all of you, welcome back to another video of our 5 Reasons series. This time we will talk about corpus linguistics and we would like to give you 5 reasons for its importance. And as usual I have an expert with me who is well into the topic much better than I am. She is one of Germany's leading corpus linguists, Anke Lüderling from Berlin, also well known for her work in morphology and last but not least she is the secretary of Germany's Society of Linguistics. Welcome Anke. Thank you. I am happy to be here. So as usual I will not introduce our guest any further. Just google Anke Lüderling's name and you will find her straight forwardly on the web. But just in case, here is her website. So Anke, you have been around a lot, you have been at several universities. Let me ask you the usual question first, why did you become a linguist? Was there any particular motivation that triggered your interest in the field? Yes, I grew up in a very tiny little village in Ostfriesland, Eastfriesia, with no other linguists around. And when I was in high school my godmother who was an African scholar and she is at the University of Bayreuth, I was at that time, sent me all kinds of feminist literature and then there was a book by Luisa Pusch, you probably don't know her yet. I have heard of her. Yes. Marlis Helinger was one of my teachers. And I read that book and I found that very fascinating and radical, I mean think, 80s Ostfriesland. And on the back of that book it said that she was a linguist, so I thought I will be that. And then later when I actually asked my, we had to go to this kind of job center where they advised us on where to study, how to study, et cetera, I asked them where can I study linguistics, they said, well you're going to marry anyway, so you can do that, you will never be able to make money with that. But then of course when I started linguistics I found that fascinating, I never did any feminist linguistics at all. So that's a good motivation, but let's now talk about the five reasons for the dealing with corpus linguistics or with corporate linguistics. One reason is certainly that modern linguistics uses corpora for exemplification as example banks. Would you subscribe to that? Of course. All linguists always have used real examples from literature to exemplify whatever grammatical statements they make. Look at any linguists for English linguistics and German linguistics in the 19th century already. They always say blah, blah, blah, about, I don't know, let's say, genitives work this way as you can see from blah, blah, blah, blah, and then they give you all kinds of citations. How do you find those citations? Intuition, it was in the past, just intuition. No, no, no, they found citations, real literature citations from Goethe and blah, blah, blah. They've always done that. So they had to go to the library. So they had to read everything. And then you have a new topic, you have to read everything again. And then you have all kinds of file card systems to have examples for this construction, that construction, et cetera. Well, you can still do that, of course, but you can do that systematically, no. You can just type a query and then find something. And that's interesting. And I just want to mention one thing. What happens if you don't find an example of whatever you're trying to look at in your corpus? Many people have said, well, if you don't find an example, things don't exist. That's, of course, not true because language is infinite and you always have only a finite sample. Every corpus is only just a finite sample of whatever it's there. But if something that you expect to be there is just obviously absent, that might mean something. And you have to be careful with your conclusions, but you can actually learn something from things not being there. And that's not, and you can only do that using corpora. And what I learned from colleagues who are using corpora and myself is that corpus linguistics is especially important in language variation. Oh, OK. Yeah. Could you argue for that point, please? Yes. We all know that our linguistic behavior differs from situation to situation. I mean, just think of spoken informal dialogues or formal written term papers or whatever. We've always known that, but we've never known exactly which linguistic features depend on which other co-varying variables, as we call them, or independent variables. And that is the purpose of linguistic talk or whatever text. Dialogs between people. And also features that are speaker features, hearer features. Do I know you well? Do I like you? Et cetera, et cetera. Of course, I like you. And I would talk differently now if I didn't, but we can only study that systematically once we have corpora, where all those external co-varying variables are in the metadata. And we find much more fine grained differences than we've always, than we've thought previously. Now, corpora don't just come from out of the blue. They have to be created. They have to be developed. And one principle of development is annotation. In what way can annotation be an argument for corpus linguistics? It's hard work, isn't it? Yeah. But it's helpful. Yes. And it's necessary. I think it's necessary for good scholarship, actually, because I think that one of the most important issues in linguistics is that you can make your analysis of the corpus data visible through annotation. I mean, just think of simple things like part of speech. And then you think, OK, we all know what a noun is. We all know what a verb is. But we don't all know whether a given participant, a given instance, is a verb or an adjective. In a particular context, yes. And so if I, I don't know, an analysis on participles, and I want to convince you of some grammatical point about participles, you should be able to see my analysis. And previously what we did is we have a text, and we analyze that text, and we categorize the words in that text. But we do that on a piece of paper, in a Word file, in Excel file, on a file card. And then we give you the reader, the colleague. We only give you the final finished analysis, but you cannot see the interpretation step. Annotation, finally, we can make that visible. And I think that's absolutely necessary for a scholarship. And I think we should never go back to. Yeah. So you would probably agree with me when I say modern linguistics is, well, almost impossible without corpus analysis. So we need corpora for quantitative analysis. Yes, we need corpora for many things. But I would not say that we need corpora for everything. But I think it depends on the research question. We also need other kinds of data, of course. I mean, corpus data can be augmented with other kinds of data. And then there are, of course, research questions where corpora are not useful, but corpora are useful for any type of quantitative analysis, because finally, you can count. You can count the words, you can count categories in the annotation, you can have interesting, more complex statistical models, finally, that you couldn't do before. You can use that for building hypotheses about language that you can then test using other corpora or using other means, like psycho-linguistic experiments or something, and you can actually model languages. That means you can build interesting models that will predict what could happen, and then you can see whether that really happens. And that's something that's never been possible for. Yeah, OK. Now, when we talked about this context, this content beforehand, you told me there's one final reason, a sort of, you called it, a meta-reason for using corpora. What can we understand by that meta-reason? OK. I think parts of linguistics are becoming more like science than like a humanity. And by that, I mean that finally, in certain parts of linguistics, for certain areas of linguistics, we want to have results that are testable and reproducible and replicable first, and then reproducible using other data. And corpus data, if it is available, if it is well documented, if you have enough metadata, and if you have annotation with everything there, like meaning documentation, evaluation, et cetera, then only then can you reproduce results. OK. So I think that's absolutely necessary. I understand. So thank you very much for this final plea. As usual with our guests, they could go on and on, because they know so much about the field. But we have to stop here because we want to produce short and concise videos, thus we have to confine ourselves to just these five reasons. So on behalf of all our viewers, thank you very much for your help, Anke, and all the best for your future career. Thank you. Thank you.