 A fan club for all of us here, most of us I think know him very well, so he just drove me to the introduction because he's one of us, or used to be one of us, but he actually left us a couple of years ago went to the University of Surrey for good or for worse. We were all very, very happy. But anyway, he's now there and he's quite happy there, I guess. And he does a lot of work on morphology, syntax, typology, and language documentation and description, working on all sorts of interesting languages, particularly some African languages and some languages, some Tibetan languages and that's what he's going to talk about today. It's a joint talk actually, but his courses are not here, I guess. No. Yes, I will. Yes, and we'll introduce your topic, but basically it's about referential identity and referential argument. Thank you very much, Irina. Thanks everybody for coming. It's really lovely to be back at SOAS, it always fills me with a nice warm glow to be here. And then I leave and I go, thank God I'm not there anymore. There is a handout, but as you can see from my very highly high DPI pictures, it's probably taking Kandid ages to print it. So it will come, she's doing it now. And everything on the handout is the same as on the slide. Okay, so yes, when I worked here, I worked on African languages, but now I work in Nepal. And what I'm going to talk to you about today is, well, sort of, referential density and differential argument marking in three of the languages that I'm working on with my co-authors, Christine Hildebrandt, who many of you know, who's at SIUE, which is in Edwardsville in Illinois, and Duby Nanda-Dakal, who is our other co-author, who's based in Trubivan University, which is in Kathmandu. As you can imagine, Illinois and Kathmandu are quite far away, and as much as the draw card of SOAS is wonderful, they couldn't make it today. Okay, so this picture here is of NAR village, which is where one of the languages that I'm going to talk about is spoken, and it's only spoken in one village. And I thought it was a fitting picture because they live in quite a density-packed village. Okay, so the main issue today is case. If you were interested in agreement, you're not going to get any, I'm afraid. So I'm going to talk about case and the relationship between case and differential argument marking, and how that fits in in a language that has very low referential density. So just to recap them on what I mean by case, case is a grammatical strategy for assigning and identifying the role of MPs through morphological marking of its dependence, such as core arguments and adjuncts. And the thing about case is there are different ways of assigning it, both in purely descriptive terms and in theoretical terms. So one way you might talk about it is that it's structurally predictable, that you get a given case because you're in a particular structural position. That's the kind of thing that you might think about if you're a minimalist, for instance. Lexically predictable, that it's assigned based on a particular lexical item. So a particular verb says, I need my subject to be in dative case, rather than say in nominative or regative, which might be the normal structural case, for instance. Or it can be semantically determined, and this is what we're talking about when cases are purely spatial, for instance. So cases that talk about spatial relations, which are not really core arguments at all. And that's another way of looking at it. So it's a bit different from saying that something's lexically determined. But the point I'm going to make today is that the case marking potential of an argument is not always invariably governed. So this means that we don't always get consistent case marking. So even if we've got cases that fall into these different types, but we can also have things which might be called probabilistically determinable case. So this happens when case marking is based on variable characteristics of the governor and the governing. The governor is the thing that says, you go in this case, and the governing is the thing that has that case. So the governing is the noun phrase. And the governor is a preposition or a verb. So when we have this, there's a less straightforward relationship between the normal mapping that we have, which is where you have the argument structural case frame of a verb and its dependent NPs talking to one another in a fairly straightforward way. So on then to differential argument marking. Now when we have variability in the types of case marking that we can have on a noun phrase, this can give rise to differential subject marking and differential object marking. These are terms, particularly differential object marking that you're probably familiar with already. Differential subject marking is just the same thing, but it's on subjects. This can be done in a number of different ways. I'm only going to be talking about differential case marking today because there's no agreement in these languages. But it could also be done through indexing on the verb or other types of verbal morphology. So here are a couple of examples from Manang Gurung. It's called Manang Gurung because Gurung is an enormous language. And I'm only talking about the language spoken in Manang district today, which is the district of Nepal. So we're talking about a variety of Gurung here. Here is an instance where we've got differential subject marking based on a tense split. So in one A, we've got a past tense sentence. Yesterday I dug a hole in the ground. And here you can see that the ergative marked pronoun, Ngai, is possible, but it's not possible here without the ergative. This is what you might think would be normal in a fairly straightforward transitive type sentence, or complement taking type sentence, let's call it. We can see if we look at ground, which here is the goal, or whatever you want to call that argument, the third argument. Here it's got locative case, but you can't leave off the locative case, right? Now if you look at the second example, here we've got an example in the non-past, so present or future. So here the reading is tomorrow I will dig a hole in the ground. The ground behaves in the same way. You have to have locative case on that, whether you consider that an argument or an adjunct doesn't matter for our current purposes. But what you can see on the subject is that either the ergative can be there or not. So the ability for ergative to be there or not is the kind of thing I'm interested in. This split is only within the non-past. So we've got an aspect, sorry, a tense split, and then within the non-past, we then have differential ergative case marking. So the object of the research is to find out what's going on when we have this. Under what conditions do you have one and not the other? So you're now familiar with differential argument marking, differential object marking, differential subject marking, let's throw in another differential. It's differential ergative case marking. So this is a type of differential argument marking. It's just when we're just talking about the ergative case, whether the ergative case is there or not. Now it's often referred to as optional ergative case marking for good reasons. This is a terrible term. Anything that is optional is a dreadful, dreadful term in linguistics because it's not optional. It just hides the fact that we don't know what it does, why it's behaving in this way. So the project that I've got at the moment is on optional ergative case marking but I'm going to refer to it as differential ergative case marking from now on. So when we've got differential ergative case marking what we find is the presence or absence of case marking but this is optionally absent in inverted commas because it doesn't have any consequences for the grammatical function of the MP and by that I mean that the MP is still the subject. It's not changing into an object or any other type of oblique role. It's still the subject. Now what I don't include here is anything to do with information structure so that's not what I mean by grammatical function. Grammatical function here means what its role is, whether it's subject or object or whether it's some other type of argument or adjunct. Now differential ergative case marking has a fairly restricted distribution but it's commonly found in certain areas. So it is found in little patches all over the place but the main areas where we find it are in the Himalayas and in Australia in Papua New Guinea. So I'm obviously going to be talking about the Himalayas today if you hadn't already gathered that from the picture and the references I've made so far. There are lots of different factors that are involved in differential ergative case marking and some of these include information structure, position on the animosity hierarchy of the argument in question and whether there are any tense aspect mood splits in the language. Now why was I talking about referential density? Well the main reason is because these languages have extremely low referential density. What this means is that the preference for argument realisation is such that you very rarely or much less so than say a language like English have over noun phrases whether headed by a common noun or a pronoun or whatever they just don't turn up as much as you might expect them to. Now this is a bit of a problem when you want to look at case marking because case marking tends not to turn up unless it's on a noun or a pronoun. The reason why I'm talking about referential density here is because we've got three languages with very low referential densities so we don't see noun phrases as much as we might expect. Therefore we're going to expect the case marking to turn up much less than we would hope to. What we're going to examine today or what I'm going to try and show you that I'm attempting to examine because it's quite difficult given the data set is the conditions under which AS arguments can be differentially marked or omitted in these languages and the important point here is that they've got low referential density so we have to explore which factors influence the manifestation of AS arguments at all as well as whether they get case marked or not too. So there's two things here when do the arguments turn up when they do turn up when do they get ergative case marking? So when do they turn up as the referential density when do they get case marking is the differential argument marking? Okay. So these are the languages that I'm going to talk about today. This is part of two projects. One of the projects is a project funded by the British Academy which is a project on optional ergative case marking. That's a two-year project and we're going to be at the end of the first year in May. So this stuff is really rather preliminary I should say. The other project is a five-year project. That's a project run by Christine Hildebrand at SIUE. Some of you may have seen a talk given by us at LDLT4 last couple of months ago. That project is on mapping sociolinguistic variables to Google Maps style maps of the Himalayas. So satellite imagery. There's text linked into this sociolinguistic information. And we're collecting lots as part of both of the projects. We're collecting discourses and as well as sociolinguistic interviews and some elicitation too. And other elicitation takes place in Kathmandu. So this is going to report on a subset of the results from that work. And it's also on a subset of the languages because we're working on four different languages. Those languages are Manangay, Manangurung and Na. And the fourth language, these are all Tamangic languages. So the Tamangic languages I'll speak about today. The fourth language is called Gelsundo. It's a Tibetan language which until recently was thought to be part of the Tamangic group but Christine and Joe Perry have demonstrated that that's not the case. So it's Tibetan and it's really quite different from these languages and I'm not including it just in the interest of time more than any other reason. So you can see here, even though this is a wonderful project with wonderful maps, I've gone old school and given you a very basic map here because it involved much less effort. This is Nepal here. So just above this black region is Tibet. We're about 15 kilometres from the border with Tibet. Not that you could get there very easily, of course. The blue area is where Manangay is spoken. The green area is where Manangurung is spoken. The orange area is where Na is spoken and Gelsundo is kind of intermingled with Manangurung in the green area. So in many of the Gelsundo villages there are also Gurung speakers and vice versa. Na village is very, very high up and very, very isolated and it took us 11 hours of walking to get to the place where we were going to stay the night to then walk another 6 or 7 hours the next day to get to the village. So it's very, very remote. And I was very ill when I was doing it too. So let's get on to some data then and you can see what kind of concerns I've got about dealing with these issues. Here is some data on differential ergative marking in Manangurung. And there are two sets of examples here. In the first set we've actually got some intransitives. Okay. And the reason why I'm showing you intransitives is because if I showed you a transitive it would be very boring. So if it's past tense it's going to have an ergative mark on it in an illicit deterrence. Okay. So there's no point in showing it to you. It's boring. This is more interesting because I'm just showing you that the split is not quite along the lines of transitive in transitive or complement taking, not complement taking. What we've really got here are some splits which are determined by other properties, in this case a property of the noun phrase and of the verb. Okay. So this is not every intransitive verb that will do this. Just a subset. They're kind of unergative type verbs. We haven't quite got it right what we think they are yet or how we might classify them. But here we've got jump and with an animate, human animate, we have to have the ergative case marker on the boy jumped. Okay. But if it's a goat that's jumping the ergative marker is not possible. So there's a distinction here not just in that animates have to have it but rather humans have it and other high animates don't. Okay. In Gelsundo we see other types of splits that don't quite mirror this. So a kangaroo behaves like a human but sheep and fleas behave differently. Okay. So we have all sorts of weird splits going on. Okay. The third example here is kind of like the one that I showed you earlier. It's just the case of the split between past and non-past. So in the non-past we've got this situation where we can have both ergative marking and not ergative marking or unmarked subject MP. And again what this shows is that we're not sure what the parameter is here. And the problem with elicitation of materials like this is that ergative case marking shows up in past tense clauses fairly regularly in elicitation but it doesn't in discourses. Okay. So there's no point in spending a lot of time trying to work out what's going on in this elicited sentence because what we really need to know is how it's being used in discourse because that's going to be much more informative about the distribution of ergative case marking. Okay. So how are we going about doing this? Well I've just said let's not bother with elicitation too much although we are doing elicitation too. What we're doing is we've collected lots of texts. They've been transcribed with native speaker help. So our postdoc Duby is doing that with native speakers and another Nepali transcription assistant. And the discourse data is then entered unfortunately by me because it's very tiesome into this database. And this is a very horrible screenshot of what it looks like but what I'm basically showing you here is the fields that get entered for each verb. So each record is a verb in a discourse. So imagine you have a sentence that's got three verbs in it. They're each going to have a record. Okay. The records match up to the toolbox to how they're set up in toolbox. Each verb is then given a number as how it occurs sequentially within its toolbox record. So the third one in record the third one will be verb three in toolbox one but in the second sentence the third verb will also be three if you see what I mean because it will be in reference two and it will be the third verb there. Okay. So what we've got here in red you can see information about the subject. This is information about whether it's a common noun, a pronoun, a kinship term et cetera bless you. Whether it's person mark, number mark, case mark definiteness mark, possessed has a demonstrative whether it's quantified with a number or any other quantifier whether it's attributively modified or whether it's modified with a clause. Okay. So just filling in one subject takes a while and they're just the things that you can see. There's also a box of semantic-y things over here which are not necessarily marked. And the reason for this is that you might say that something doesn't have number marking but it may well be plural because number marking doesn't always turn up either. Okay. So I'm trying to make this as broad as possible. This is just part of the database for every argument the same information is being encoded it's also other information that's being encoded includes what its argument potential is so whether it can take a compliment or not how many verbs since the last mention of the subject if it's co-referential with an earlier subject whether the subject is anaphorically or cataphorically available. Lots and lots and lots of information so filling in one verb takes some time it's quite slow progress. The stuff in blue at the top is about aspect, tense, normalization, marking whether it's a converbial form et cetera. Okay. And the stuff at the top in the header is just about what text it's from what language when it was added et cetera. Okay. So we're using data as I said from stories so the stuff that's in the database at the moment is stories or just narratives kind of expositional texts but it will ultimately include lots of different things but not elicitation okay it's all spontaneous or semi-spontaneous and what we what we claim then is that this permits the exploration of linguistic variability through exploring consistencies and subtle differences among the languages under investigation so we're getting similar types of data from the languages it's not as beautifully constrained as I would like it to be but that's just to do with how much is in there at the moment and you'll see what I mean by that in a minute. Okay. So what kind of variables are we interested in then well here they are these are all based on paper discussion in Celia and his lot who had a volume on differential argument marking in the Tabeto-Burman area and the ones in red are things that we can encode in the database although speech predicate's actually that isn't strictly encoded but you can tell whether something's a speech predicate or not so at the moment it doesn't seem to be that relevant to me. So the predicate valence clause priority aspect 10 to person number animacy, humaneness, deafness, specificity, referentiality etc. whether it's a heavy MP or not so whether it's modified that's what all that modification stuff is about and that's also included whether it precedes or follows whether there's a switch in the agent or not actually that isn't strictly encoded. The stuff in green so they're the things that we could encode and we aim to encode. The stuff in green is stuff which is much harder to encode because it involves but these are also claimed to be important in differential ergative case marking. So whether the agent is volitional or not whether they had control of the situation or not has been demonstrated to be the case in some languages. It's sometimes used to for contrastive focus purposes. This is quite difficult because we don't know enough about the languages it's not that it's difficult per se but we just don't know enough about the focus situation yet. Subjective judgments of the speaker so what their attitude is towards the event and socially unexpected actions and again really difficult to encode in a database unless you've got cultural awareness and even if you have got cultural awareness it will be very very difficult to encode if you're not the speaker. Okay. And I'm doing the stuff in the database so there's just no way that's going to happen. Okay. Okay. So to give you an idea of what's in the database so far here is a table showing what we've got for Manangay, Gurung and Na. It's a very small set of data and this will come up again and again. It poses lots of limitations for what I can talk about today but as I say we're this is a project in progress not a final project so you should just take these as being indicative of what we're trying to do rather than what our results are. Okay. So what this table shows then are the three languages there in the columns Manangay, Gurung and Na and the red row at the top says how many verb forms we have for each language. There's also Gyasundo data which is about the same they're roughly around 100 I don't know why the Gurung one is so low we've got loads of Gurung text but they just haven't been entered yet. So it's a small data set the next column says so that means that there are you know almost 300 records between 250 and 300 records for those three languages. The next row says how many verbs there are with overt noun phrases either it's their intransitive subject or the subject of a transitive or di-transitive and you can see that out of those 129 verb forms only 37 have subject over subjects. Now of course some of them are independent clauses and they wouldn't have overt subjects in English either but most of them are not so it's quite a low number and of all of these so with 129 verbs we get three ergative marked noun phrases so you can see for the Manang data it's really really low and it's really really difficult so to just get three I had to do 129 records in the database now I could just pick and choose them but the point is it doesn't make any sense unless you know what's going on in the preceding or the records afterwards similarly if you look across the table you'll see that for each of them within this data set there are just three ergative case marked forms for each language this is the important part here for our purposes these are the compliment taking verbs by that I mean things that take compliments like objects or clauses or non-finite clauses but not copulas that take predictive arguments they're not compliment taking in this definition they're counted as in transitive so of these 129 there are 35 transitive verbs that could have an argument seven of them do and only three of them are ergative we've got 33 in Guru seven of those have an overt argument and three of them are ergative so this is a subset of this number and with NAR as well 34 so we've got a similar data set for each language here and we can see that they've got roughly the same distribution of overt A's and roughly the same distribution well exactly the same distribution of ergatives but of course they're percentages so they're not quite identical now if we double this in terms of verbs we should expect to have double the ergatives so to get ten ergatives it's going to be 300 verbs it's quite a lot of text it's quite a lot of work but we'll get there at some point in the future this is work in progress and so there are some data problems and there are some solutions that we can apply to it to see if we can get a bit further along as I said elicited data is unreliable for determining splits because the generalizations do not extend to discourse so as soon as you look at discourse you're none the wiser now if you're just interested in what somebody's got in their grammatical system what they think of in terms of their what's grammatical and what's not grammatical then you can do elicitation if you're interested in how language works and I think this is a better way of doing it so the data set exhibits low referential density for all three languages and this leads to a minimal capacity to case mark MPs so we just don't get enough MPs to look at there's limited text coding so that's the amount of data that I've managed to code so far but also the frequency within that, so the relative frequency like three ergatives for 129 verbs reduces the power of the statistical methods that can be used on this data so with more data you can use more powerful statistical methods but with this data there are some methods that you can use basically what happens is the test is expecting to find a certain number within the cells of your tables and if it doesn't find the certain number that it requires it rejects that data and the statistic doesn't work so you have to be able to interpret when you can use the statistic when you can't but there is another test we can use what's the solution then? what we're going to do is look at some of the variables contributing to the presence, absence of avert arguments to elucidate the conditions under which case marking of subjects is possible so what I'm going to do first of all is try and eradicate a quite obvious but statistically significant parameter okay continuity of reference now in English we're very used to having a subject okay so although I haven't looked at discourse data in English of course, I'm going to claim that there are lots of times where you would need a subject in English but you wouldn't in one of these languages okay now what I looked at here well I did it but what we looked at was whether there was a relationship between continuity of reference from one verb to the next and whether there's an overt argument or not it's a very very simple test what we're looking for or what we would predict is that if there's a switch in reference then a noun phrase will turn up right? because we don't have them most of the time but we'd expect to find it okay now it's a very simple thing and very boring in many respects but it's something that we can do with this data and what it does is it tells us when we've got our noun phrases and the data is quite striking even on this small set so remember what we've got here are 129 verbs from Anangay 86 for Na and 71 for Gurung and what I did was use the Pearson's Chi square test to examine whether the discontinuation of subject reference so that's the same ref, different ref is a predictor of the realisation of an A or an S with an NP so I'm not just looking at transitive clauses here it's transitive and intransitive complement taking and not complement taking and what you can see this is a null hypothesis in grey the null hypothesis is not upheld we do have statistically significant results so the null hypothesis states there's no relationship between these two factors there is, there's a significant relationship association between continuity of reference and avoidance of overt ASs in all three languages so basically if you look at the data at the top in the table, these are called contingency tables what they tell us is how many, what the frequency of each, what the frequency count is of each of these within the corpus so of the 129 there were three examples where the reference continued to be the same there was an overt noun phrase so that's the thing that we weren't expecting to happen but it does happen a bit of course and this is good data then because we don't expect it all to be neat if your data's too neat it means that it might be wrong you want to have things that are a little bit like this that's why we use the statistic so what this shows then is if you've got an overt A or S, noun phrase that's a column on the left then it's very likely to have a different reference so three of the times it was the same reference but 36 of the times it was different so that's what this statistic tells us and this is just reporting that it's significant this means chi-square this is the degrees of freedom and this here is the important part of the bit with the P it needs to be less than 0.05 to be significant and all of these are less than 0.05 okay so that is one thing I managed to do on this data I'm very pleased with myself it's a very limited data set so to do something like that's pretty good now let's look at some data how am I doing for time actually I could probably talk for hours 20 minutes right so now we're going to look at some data from the individual languages and I've talked you through what we're interested in this is one of the villages it's called Tula Manang and it's situated in a wide part of the valley it's extremely beautiful and these are the terraces where they grow their vegetables basically and this is the capital of Manang district where they speak Manangay okay so remember we've got three examples with negative case marking that isn't enough data to say anything meaningful about what we're looking at the complete data set is transitives or compliment taking verbs rather where there is an overt argument that's the data set not just the ergatives we're looking at when it's there and when it's not there okay so here's the data of well here's the discussion of the data the data is on the next page so you can look at it on your slides if you want to at the same time on your handout in Manangay discourse is that ergative is marked by an inclinic and this follows the plural number of clitic if it's there and the definiteness clitic so it's at the end of the noun phrase where we would expect it to be and ergative marking is used in discourse to denote a switch between equally attentive participants protagonists okay so it's used when there could be some doubt over who is doing what it's saying that there's a switch who we're talking about before is no longer being talked about within these clauses all objects or compliments are overtly realised we'll see why this is relevant later on and they're in the kind of clauses that you might expect them to be main clauses evidentially marked for these sorts of languages or tense marked well these are evidentially marked I mean they've got different you know some of them are tense marking languages some of them are evidential marking languages some of them have a bit of a combination so it's in a main clause or a convertible transitive clause where there's a different subject to the matrix okay so it's important that that shows up okay so here's some data these are not sentences in sequence although they are from the same story they're just nice examples so here we've got after the yaks who stayed on the hill cursed them literally him because there's like the plural marking is kind of up the wall in these languages so in red in four is the ergative subject of curse okay so it's S O V here so the pronoun that follows that is the object and what you might glean is the yaks who stayed on the hill cursed them what we've got here is a switch from we were talking about them before that's the yaks which are at the bottom of the hill in the valley so there's two sets of yaks right so this is how we know they're equally they have the same properties there's just some up a mountain and some down a mountain and don't think anybody's ever claimed that up and down a mountain is a variable for optional ergative case marking okay the second example there it's got the same issue saying become like this they made a curse we've got here a switch from the previous clause which you can't see here it's not four is not the clause before five and similarly in six the friends were saying are they coming back or not after that they didn't come back you can see in these examples they're talking about different participants the ergative case marking there is not because they're agentive it's not because they're well it is because they're agentive but it's not just because they're an animate or plural there are cases where they're animate and plural and it doesn't get ergative marking elsewhere it's because we're switching between two equal sets and what's interesting about the Menange Discourse is that when there's an over noun phrase but it's not ergative case marked although they're very infrequent these are the three examples which didn't fit my statistic nicely well I said they were okay but some people look like they didn't like it too much looks so the these are the three aberrant examples they appear when it's used for maintenance of reference here so we've got the erg showing up when we're switching but if we need to maintain for some reason we've got a noun phrase okay so these are the the presence of the noun phrase is determined by what's going on in terms of the structure of the discourse but so is the ergative case marking what I should point out though is the ergative case marking couldn't turn up on any argument in that scenario it's got to be because it has certain properties like animates the ergatives turn up they're always animate when they turn up in the dataset we've got they're also all definite and specific for these unmarked ones the objects are realised but these have these are also main clauses but they actually have a different characteristic in terms of their time marking and we don't know what this no is at the moment but these show up when no is there so that suspicious but it's interesting okay so distinguishing between the roles of arguments within the clause is not well motivated by this data so the reason why I claim this is because the object show up in the ones which are unmarked and the ones which are marked for ergative case so anybody who ever claims that ergative case marking turns up to differentiate between an object and a subject they're kind of missing the trick because when you look at lots of data that isn't what they're doing at all okay so it doesn't it's not well motivated for that reason rather it supports a view that erg will be marked on ways when there's a greater likelihood of distinguishing references across clauses okay the next language we're going to look at is manangurung so this is gurung it's got like a million I think it's got like I don't know how many do you know how many speakers it's got half a million it's massive and it's spoken over a really wide area but this is just manangurung which has like maybe a thousand or so speakers okay so this is some children at school in natchae village okay now it's not this erg case marking isn't just showing up in the data set when there's switch between participants there's also lexical case in these languages okay so although I'm talking about differential argument marking in general some of this stuff is not differential it has to be there so in manangurung you have to have ergative case marking with a verb like no it's lexically specified so I'm not claiming that all ergative case marking belongs to one type rather that in some of the data it looks as though lexical considerations are afoot so here are some elicited examples but I'll come onto discus examples in a minute of the verb no this is the verb to mean no someone or something rather than knowing a place okay and what we have here are two non-past sentences and the ergative case marking is a question and answer pair ergative case marking is required on the second person singular subject in the question and the first singular subject in the answer it's not possible to have it without ergative case marking and the object of a compliment needs to be in the dative case it can't be unmarked okay now this only makes sense when we compare it with another verb like no a place so here you can see the relationship between the semantic roles and their case marking to know someone something has an experiencer in the nominative sorry it shouldn't say nominative it should say ergative ergative in a stimulus in a dative whereas knowing a place has the experiencer in the dative and the stimulus is unmarked and I put abs in there because that's the unmarked case in these languages but there's no overt marking for it okay so here we've got those examples it's not possible to have it without it and what would have been nicer here was to show that you can't have it with the ergative as well but I'm afraid I do not have that data on these slides it's probably in the note somewhere so these verbs say I need my subject in a certain case it's not differential argument marking it has to be that way so what we're going to expect is if we've got an overt MP for one of these verbs as a subject it's going to have ergative case marking on it in the discourses and that is actually what we find I'll show you in a second so certain verbs have invariable lexically determined case resulting in the presence of a case on an overt A even when it might otherwise be probabilistically predicted based on other characteristics of the discourse okay that's how you know that it absolutely must be lexically specified if it doesn't meet your characteristics your probabilistic characteristics then it's aberrant it's kind of an outlier and you have to be able to work explain why it doesn't behave in that way so when we look at the discourse we do see this pattern so in the Manangurung discourse, erg is marked by an enclitic on pronouns in the data set so it's only on pronouns in the discourse data that we've got two of the three verbs that have ergative subjects are the verb to know someone something okay and as I said it lexically governs case on its subjects I've put pronominal subjects in here in brackets but I was just being cautious when I did that it should be all subjects referential density is so low that there are only two unmarked transitive subjects and these are subjects of nominalised verbs and so there's not really much to say about those and I can't really make any generalisations but here's the data this is an example from the discourse and here's an enlisted example to demonstrate again that it doesn't it's not possible to leave the ergative case marking off this verb, off this subject okay then on to Naa village Naa village is very high up so we're really this is mountain region now okay and it's again very very beautiful that's the village at the beginning of the on the front page of the slides okay the data from Naa is very well behaved and really nice to look at and it was really really easy to analyse and that's lovely what we have here are two sentences from a discourse, again the ergative subject is in red and what we've got here is when we've got the ergative noun phrases what you can see is in each case we've got in the English translation at least we've got a dependent clause and then the main clause having returned to the village the village is sent me packing again the person returning to the village is the speaker of the story he is the topic of the discourse in general okay and what we find here is that we're getting the ergative noun phrase when we're switching from the general topic to some other type of temporary topic okay so in both cases it's very clear from this data that that's what's going on so erg is marked by an enclitic again it follows the plural number marker it marks non-discourse topic so discourse topic I mean to be the general topic of the discourse so you can switch during it of transitive and di-transitive all objects are unrealised in clauses with erg subjects and all clauses are affirmative past tense or main clauses and in two cases I really like these examples the transitive verb is the v1 in a serial verb construction with an intransitive v2 so what we end up with is that the subject the s of the v2 the intransitive verb is the same as the p of the transitive right so we've got the p of the transitive and the s being co-referential I wonder if that was an example there in 11 okay so remove come home okay the villagers removed me I came home right so it's quite nice example of how this is patterning in terms of complex verb structures okay what about the unmarked ones well all subject MPs without case marking and complement taking clauses are pronouns or kinship terms used as topics all objects are realized in those clauses and the verb form is usually non-finite in contrast to the cases where the ergative is there when it's usually a finite clause so there's a finite main clause okay so in the NAR data what's interesting is there's a symmetry between the presence of objects in clauses with unmarked transitive agents and ergative marked transitive agents and now this may just be a fact about the dataset so I don't want to get too excited about it but it is interesting because it makes the languages look very different from one another and I suspect they really are quite different from one another in terms of the way that their discourse is structured we actually see that there are there are many more I can show you in the table in the question period if you like there's a massive difference in terms of how many transitive and intransitive clauses there are in each of the languages that may be affecting what's going on okay so what I did then just to see whether it was interesting or not is to look at whether there was a relationship between having a compliment and having ergative case marking okay so this is using Fischer's exact test this is what you do if the chi-square isn't going to work because your dataset is too small so we examined whether the presence of a compliment in a clause is a judgment taking verb so this is a verb that can have a compliment but it might not have one there overtly it might be that you get it from the discourse that you can retrieve it from the discourse just as we know that they have subjects but they're retrievable from the discourse and an overt subject we wanted to see whether that's a predictor of case marking so the null hypothesis is that there's no relationship between the presence and absence of a compliment and the presence and absence of ergative case marking the ground is that there is a significant association between the presence of ergative case and the presence of a compliment in Manangay but an association between the presence of ergative case and the absence in Na and Gurung was not significant so the Manangay data and the Na data is the only data that's significant here and what's going to be really interesting moving forward will be to see whether this still holds when we've got enough data in the database to move on to a chi-square test and see what that really is a pattern that's representing something about the structure of the discourse OK, so what I've spoken about today is a work in progress and what I've tried to show you are some attempts to deal with very small data sets OK, in an ideal world how you make this better is you get someone to put loads and loads and loads of data into the database it's set up, it's ready to go the distinctions have been made it's the manpower that's required when I present this to you in two years time it will be wonderful but what I've done in the meantime is start to think about what problems I'm interested in and how to start to solve them and what kind of power I need in my data to be able to solve those problems so the major problem I think is the low incidence of MPs because this means we're not going to get enough case marked MPs so the low referential density and of ergative marked MPs it too the presence of case marking is not strictly determined by the grammatical function of an MP because these really are the subjects but also it's information structural properties or discourse structural properties maybe if you don't feel comfortable with information structural properties there switches in reference were shown to have statistically significant relationship with the occurrence of over MPs okay so that's how we can factor out that issue in Manangay the presence of unmarked AMPs was associated with continuity of reference while erg marked MPs denoted a switch okay so we saw a distribution in that data a significant relationship was shown between the presence of erg and the absence of a comp in Na and this demonstrates this is something I mentioned earlier but the statistic demonstrates that this is statistically significant it's not associated with distinguishing two arguments of a verb that's not why erg is showing up because in Na they don't occur together so it's got nothing to do with that the incidence of erg marked MPs is too low in the data set to determine relationships between other factors such as person marking, definiteness and number but one thing we can say is that all of the erg marked MPs are animate but we might have really expected that because inanimates don't tend to do much so determining the distribution of erg of case marking I argue can only be understood by understanding what can be expressed by its absence to thank you very much loads of room of thought, I have two questions which I'll try to keep very brief the first one pertains to slide 12 favourite of Na's table I think this one or do you want the big one actually not this one this one where you make the prediction that if you would double verbs would also double the instances of erg of marking and I'd like to challenge that a bit but I think you can actually turn it to your advantage because what you have shown is that the presence of erg of marking is linked to the type of participant of the verb to animacy for instance the position on the animacy scale and of course we know that certain verbs lexically sub-categorize for human or animate participants and in a small disc or sample it really will crucially depend on the protagonists of the story so one story with one human protagonist can tell you your entire data set but for your case since you are interested in not just the absence but the presence as well you can use that by selecting genres and particular topics for instance you can use visual stimuli that yield certain linguistic descriptions that will have a high incidence of human animate and other different positions of the animacy absolutely I should say that the choice of text that's in the data set at the moment was entirely determined by the order in which they were ready and that we could make confident yeah claims about what the argument structure is whether it could take a compliment or not what the case possibilities were and that in itself was dictated by which of them were less complicated to start with by the transcribers absolutely yeah I mean as the data set grows we'll start to be able to stratify it for genre and participants and all sorts of things to make even stronger claims but that's a long way off but I totally agree with you so yes it's true we shouldn't read too much into this distribution at the moment different types and tokens so different lexical types as in lexines and different instances of a particular lexine yeah okay yeah that's a good idea and then my second question is related to a slide I thought I had it done 12 not 18 sorry there you talk about verbs where you say they have a difference in thematic roles and I'm not quite sure that I would like to agree with your analysis of the thematic roles because for me thematic roles are generalizations about certain types of thematic participants and what you mentioned here looks more to me like in semantic frames so frame semantic participants so not at the semantic level but at the conceptual level because if something is marked differently you can't have an experience in the nominative and the data for me that would be an index that these are actually different semantic participants and that in the case the what you call the experiencer is actually construed as an agent whereas in the data case it is construed as an agent no that's totally fair criticism and I think that that could be an alternative way of certainly of talking about it the reason why I chose those semantic roles is because I was really taking from Bickel's referential density paper he had sets of pairs rather like this where he was claiming that they had exactly the same semantic roles but different case marking associated with them so they were pairs like to be afraid so one was like a paraphrastic version of the other but they had the same semantic roles but different case frames I rather calked it if you like directly into the talk so I don't have a strong feeling for me it doesn't matter actually whether they do have different semantic roles or not because it demonstrates more or less the same thing for me that things which may look like they're the same might behave differently it's lexically specified though that's for sure because not all things that have experiencers would definitely need to have an negative subject that's what makes it different yeah thanks I have a question about the slide which is called continuity of reference I think it was number nine this one yes, yeah that one I noticed that all three of the languages also had quite a lot of MPs which had either an overt AS or the same reference so it's minus and minus on that chart and there's 299 etc and that's quite a lot I was just wondering if you could say something about those types of arguments yes alright now that's an excellent question my suspicion is that the if you look at the menange table it's very different from the others and that's because it has a very high proportion of intransitive verbs in the data set and I think it's to do with intransitives so I think there's something about intransitives which mean that they're getting they're not the same reference as the previous verb but they're not getting an overt noun phrase I don't think it's within the transitive so I could test that within the subset I didn't because I was so pleased with myself for having a chi-square statistic that actually worked this isn't the only one I attempted to do I should point out but I think that's what it is and I'm not sure what the rest of the story is though can you kind of infer who the the essay argument is just from the context even if they're switching switching reference all the time yeah I think, yeah, I mean so hopefully one way of maybe coding that would be to look at things which are reference which are visual in context like in the context of the discourse for instance so speakers participants who are present so that's a really good suggestion actually I might have a look and see if I can work out what's going on you first and then Alex after yeah that counts for the plus same referent minus examples but it's the number with the 56 that's the problem where you've got the same you've got the different referent but you've got no over noun phrase I dropped the part and it broke yeah sorry I see what you mean yeah that's okay yeah sorry about that yes that's also worth looking into one thing that the database doesn't code at the moment is continuity from object reference to subject reference and I think that that's the important part that's missing and that should pick up on your suggestion so I'll look into putting that in because in my research if I have similar data in Tibetan narratives then I'll get a lower proportion of in terms of what if it happens it would be something like that yeah that's highly likely actually because you'll never ever get a third person inanimate pronoun something like that it just wouldn't happen the other thing I was wondering about is that it's still sort of familiar to me because I do Tibetan narratives almost guess some example is that is definitely abnormal to have an inanimate agent at all it just doesn't happen with major exceptions so that changes the game yeah I mean that's why I only mentioned the animacy in passing because it's just not really a very exciting variable it's much more interesting is where there's a split within what we would consider to be animates so the flea not being able to be ergative marked but a kangaroo being able to be ergative marked that I mentioned earlier but those sorts of things are just so going to be so rare in terms of discourses now it's actually not worth worrying too much about it in a big corpus that would come out and it will come out in elicitation but yeah you're right I mean the animacy parameter is not as important as some of the other ones are so I think humaneness is more important but I've got too many like yaks and stuff at the moment so the yaks are doing too much stuff but when the humans are doing the humans tend not to turn up that's the thing because they're usually discourse participants they find transitively introverted verbs the language that regularly drops that does not express participants yeah okay right okay so compliments are often there okay so object compliments and other and clausal compliments are often there okay so being able to take what being able to take a clausal compliment is what I've taken as the benchmark for saying whether something is compliment taking or not okay and something that is labile a verb that really is genuinely labile and when it's not there it's necessarily in transitive or could be necessarily in transitive those verbs are they're more obvious so it's easier to work out when to code them but the basis of it is the first stages can this verb ever take a compliment okay then we know that it has the potential for taking a compliment then looking at the text and seeing whether it's dropped its compliment or not there's no syntactic test to be able to we have the capacity to do on that at the moment but it's that's the basis of the decision making in those terms you know the only other way you could do it by saying well this is what I'm going to list this verb as in my dictionary okay this is its potential in transitive, labile, transitive whatever but really what's important for me is whether it could have a compliment and doesn't and that's the, I need to have it as a binary thing so yeah it's a little bit flaky but I think those sorts of judgments usually are quite flaky I mean if you want to decide whether eat is transitive or not you know we could speak for an hour about that so it's one of those perennial problems unfortunately so what about object participants do they get dropped as much as subject participants no so none of my tables will show you what they, whether the object is there or not but well let's have a look we can have a look at the database if it will open I don't think it wants to open while I'm in this funny view oh yeah cool okay so if we wanted to see whether let me just escape from that and I'll, if I can no okay right so let me do a search okay so what I want to do is see whether something is compliment taking or not and whether it has an object or not um okay actually that will that will do it okay so define the right row okay so out of the 300 this is across four languages okay just for just to give you an idea um so this is 162 records out of the 385 are coded as having the capacity to take compliments and uh let's see oh flip okay it looks like about a third of the time they're there that was the impression I got from that there's some spuriously empty ones um if there's empty data it means it wasn't important in computing the statistics so there's like it's partially full yeah that's really interesting because you mean you have them three percent of instances which is ten times as often yeah I mean the objects show up way more than the subjects yeah that actually um reminds me of Du Bois for argument structure space for activity so that you have absolute type of arguments more often expressed that would be very interesting to look at wouldn't it yeah I think that's definitely got to be the next step in dealing with the data because the object stuff is just uh has not been important so far but I think it's it's going to be yeah thanks Alex um Ed in one of the languages yeah so that's an association I'm wondering whether that is ultimately the reason for the narrative or if there are other factors that might unify the various distinct reasons I'm just thinking in particular of something discussed which was prosody and whether or not that's relevant at all okay it's very cruel of you to ask me a prosody question first thing uh I would really need Christine here to be able to answer a question on prosody so I'm not going to attempt to do that um but uh there's uh I really can't say anything about whether the prosody is an important factor in whether something is being marked as ergative or not um there are times when it's clear that a noun phrase is has a there's a prosody break between a noun phrase and what follows it or what precedes it um but I don't think our prosody analysis of the languages is at any stage of discussing it in these terms I mean what you can't see in this data is that they're all there's lexical tone in all of these languages and that isn't marked in the data at the moment because it's still being worked out so to be able to work out what's going on with the prosody I think we'd want to feel a bit more confident about what's going on with the lexical tone first um so I mean what do you have in mind what would your what would you anticipate oh that a switch in reference yeah okay I mean what I've you'll notice that I didn't draw any attention to a statistic which shows that that's the case I just said that that's all of the cases that I've seen are doing that okay that happens to be a fact about that the text that those instances are in whether that extends more broadly it is unclear at the moment um so what um what the reason why the statistics are nice to employ and there's only a few in this talk is because they'll tell you whether one factor alone is sufficient for determining a particular pattern or whether you need more than one factor so as we as the data set grows and we can do a log what's called a log linear analysis on this data then we can combine multiple sets of data into a single test to see and then remove them one by one to see whether uh there's a stronger prediction is made okay so the idea that if you put in three variables and it predicts it perfectly what if we only have two can we predict it perfectly that way because that's got that's a better way of doing it if you only need to predict your outcome um but that's I can't do that because the data won't pass chi-square and it won't pass log linear until it passes chi-square so that's the restriction but um it could be but I don't think the data set's big enough to think about that but it's a good good idea anyway yeah Tom did you have anything? Yeah just going back to that um point about whether subject or object um arguments are kind of expressed or not I think it's quite kind of areally typical that um um p arguments are frequently omitted if they're topical and as the SNA arguments will have a kind of greater tendency to be topical then it's likely that more of them will be omitted ones that the cases when you have p as inadvertently expressed is likely when it's new information and those can also be omitted if they're already topical in the distance um sure that's like quite typical yeah I think with the objects it will be important to encode whether they're first I mean there's no um system for encoding first mentions in here at the moment but that's the other thing that needs to go in um to be able to start looking at that because the without that the other parameters are not going to the other added parameters aren't going to make much sense and I think that you're you're totally right uh yeah there's no kind of first mention or whether it's new or not then there's no yeah I mean the uh the um yes and that's why yes and there's no and there's no way you could plug that into the statistics so at the moment like the the first mention thing is just is impressionistic here what's topic is impressionistic I should point that out right it's not it's not based on a um accounting method or anything like that there is there is actually a method for tracking reference in the database uh but I don't think it's quite watertight enough to claim that that's that's robust using a statistic yeah yeah the first mention stuff needs to be coded yeah thank you you know I was wondering whether whether you could put the findings in wider comparative or functional or historical contexts like one could I expect yes because it seems to not coming from the language area so it doesn't seem I mean you know it's case terminology yeah but it doesn't look very much like a oh I see right okay so then the question is so you have the restriction on the tense aspect you have you know you find it in pronouns you find it lexicalized with certain verbs so what sort of in what sort of tracking system is that right okay so the alternative to calling it case you could imagine calling it something like topic marker for some languages right you could imagine calling it I'm not necessarily talking about the languages I've got here but there are there are different things that you could call it other than case lots of people don't call it case in or use a case term like ergue they use something like agents um the thing is it doesn't really that that's just another that's a way that's just another way of presenting the same information um consistent with your findings either no I mean in some of these languages the form of the ergative is the same as another case for okay which suggests that it really is genuine case okay um so in Yelsumdo the um the genitive and ergative have the same form and that's a typologically well known combination of case that's the case so that looks like case but that's the Tibetan language these are the the um um within a broader context um it seems that the parameters that affect the occurrence of ergative case marking or what I'm calling is consistent with what's going on in other languages not just in the Himalayas but also elsewhere um what is difficult to tell is the directionality of it um and there are two ways that you could look at it that it's developing or disappearing um and I personally don't have an opinion on that but I'm sure scholars who are more acquainted with the area would do um but um I'm sensing I'm not answering your question very well what is it you but I'm also because presumably older studies have been based much more on elicitation and text so this is quite new yeah but I think this area in general is quite new because I don't think that people have been really considering um I've been considering this from a distributional perspective before so I mean the the stuff the stuff that exists on this topic is sort of you know it's like in the last 20 years it's not old stuff it's you know that um I can't speak about really old grammars but there's there's sort of a flurry of interest in it ALT this year nearly every other talk was on differential argument marking in some form or the other everybody's looking at it um and I think that that represents what that people are starting to get interested in it particularly differential subject marking because you know people have been interested in object marking for a while but I think differential subject marking now people are starting to claim things I know for Tibetoburman or I'm I suspect based on the Celia and Hislot book um I think it was a journal actually volume that no statistical accounts have been done of this in Tibetoburman languages um so it's that's really the direction that it needs to go that's what I'm trying to sort of like get towards I'm waiting through mud at the moment but get in that direction yeah it's something like that for switch reference in classical Tibetan and case model is fairly consistent on agents so it looks like it's getting sort of worse of a time at the same time that they develop spare space bits within Tibetoburman did you publish any of this did you publish none? we will come okay don't share it with me when you're ready anyway we'd really like to have it so Tibetan is a different language yeah I mean Gelsumdo Gelsumdo I don't I was writing the Himalayas yeah but yeah no no but in the in the in answering that question he was talking about I was talking about Tibetoburman like somebody doing statistical work on it yeah Gelsumdo is not like this though I mean Gelsumdo is very different it's got a conjunct discharge agreement system it's got loads more cases it's doesn't look like this doesn't behave like this so that's one reason to exclude it for the time being but it's very interesting and that's got all sorts of differential argument marking object and subject okay more or less yeah good stab