 Okay, so today I would like to present a method that I started working on for my dissertation and have continued to work on since then, on and off, a lot with my colleague Kevin Ryan who is at Harvard in Linguistics and more recently with a colleague Taylor Arnold who was at my current institution, University of Richmond in the math department and the method has just been developed specifically to sort of guide reconstruction of the big beta but I would like to think that it has some broader applications and so I will be as always interested in your particular method but I'm at least as interested in your ideas of where we might potentially, I say elsewhere, should I stand more at this point? Thank you. Okay. All right, so just a brief sort of intro to the rig beta. This is the oldest text of the Indic branch of Indo-European, also called Indo-Aryan. The language is likewise called Vedic and unlike classical Sanskrit it's a living language and we know that because we watch changes, observe changes in every part of the grammar between the oldest Vedic text which is the Rig Veda and the sort of through middle Vedic and younger Vedic until it sort of fits this classical Sanskrit and what the Rig Veda contains are a bunch of praise poems mostly praising the deities of the Vedic Pantheon and sort of trying to motivate them to attend the ritual where the poem is being performed and sort of benefit the ritualists. The entire text is in poetic meter and there are, and all of the meters have in common that they regulate syllable count and also syllable weight distribution and some of them also have Cesare that is to say points in the verse line where a word or phrase break is required and it's important to know about Vedic that there's a two-way syllable weight distinction between light on the one hand and heavy on the other and if a syllable ends in a short vowel only it counts as light and all other syllables are heavy that is to say long vowel syllables, short vowel plus consonant rhymes, long vowel plus consonant rhymes and so forth. So here's a sort of rough representation of the eight syllable verse type and so I'm using the sort of X or I should start with the breath. So I'm using that to show positions that are usually implemented with the light syllables about a third of the time only heavy about a third of the time then the X just shows relatively free positions that are sort of only bit or heavy anywhere from one third to two thirds of the time and the macron or the long mark shows you preferentially heavy positions and you can see that there's essentially like an iambic rhythm to the verse that goes you know the done, the done, the done, the done and there's sort of two additional and well-known principles at work here. One is so-called, I think this is on the next line, yes, one is final strictness and this applies to the verse line and it essentially says that the later in the line the more strictly syllable weight is regulated, I guess there are good typological parallels for this as well and then the other one is final indifference and this just applies to the last position or the last syllable of the verse and that's relatively indifferent to weight so even though you know analyzing this as an underlyingly iambic meter you would expect you know to find heavy syllables in the last position, final indifference sort of permits you to implement that with either like or heavy syllable and the 11th syllable verse line either has a caesura after the fourth position or after the fifth and you can see it's sort of there's an iambic opening like done, done, the done and then the done, then the done, then or you could go the done, the done, then the done, the done, and so I have to stress the preferentially heavy positions to sort of be able to hear them because I'm an English speaker. And the 12 is really just like the 11 except that you sort of descriptively insert a preferentially light position in the pen ultimate position of the verse so it opens the same way the done, the done and then you have the same sort of the done and then instead of having done, the done, then you have done, the done, the done. Okay and so I'm almost over with the description of the meter so bear with me so okay and so you basically you just you just put these verse lines together in threes or fours usually sometimes other combinations to make up a stanza and you can show that there's more structure than just the line it's not a purely stickic verse type there's evidence for a couplet like structure and for stanza structure and so forth too but for us it's really just the verse line that's important. So most of the Rig Veda 83% is composed in verse lines of the three types that I just described and so for the rest of this talk I'm just going to be excluding the rest of them as sort of unusual meters to be able to do some comparisons that I'll talk about in just a second and so you can tally the number of verse lines which are called padas by sans-critus usually and then you know you can sort of calculate the corpus size this way by multiplying the number of padas by the number of syllables in the padas to get a sense of how big the corpus is. Okay so the text we think was maybe composed around 1200 or Professor Hill suggested 12th century I don't think it really matters for our purposes it's we don't really know exactly. It must have been composed over a relatively long period of time because we do see differences in the language of the kind of older parts of the text and the younger parts of the text and then it was transmitted orally for many centuries in an incredibly accurate way and that's you know thanks to these feats of memory and those were of course more motivated by the importance sort of given to correct recitation in this particular culture or subculture but nevertheless there were some changes made to the text and some of those involve the replacement of linguistically older more okay forms with with younger ones and sometimes the younger ones are the ones that are familiar to us from classical sans-crit so you have some basically some old Vedic stuff getting replaced with some younger classical sans-crit stuff okay so one of the one of the examples that is usually given because it's almost completely exceptionless here involves a sequence of a consonant in the transmitted text the consonant followed by glide either a yah or a wah which is written with the V followed by a vowel that has like a grove accent mark on it and that grove is originally representing falling pitch and this is also essentially the only place where you find a grove accent in the text so it's a pitch accent language as a rule with with some systematic exceptions lexical words have one syllable that receives a high pitch and the rest of you know the rise to that high in the fall from that high back is is sort of fanatic right so this is a little a little exception to that oh it's and I should also say that the the accent is this almost if not purely morphologically determined the placement of that okay so wherever you find this sequence that is to say consonant glide and then vowel with falling pitch the verse line is one syllable short so this so this motivate sort of paying attention to the meter motivates the reconstruction of an extra syllable so just as an example I took the word for son which is transmitted as swar and I just took some eight syllable verses with the phrase swar darshi which means to see the son and and you can see Prathyan Vishwan swar darshi is seven syllables but if you reconstruct Suvar darshi then you get eight syllables and so this is you I think they're like maybe they're hundreds if not thousands of examples of this kind of a sequence it's almost perfectly regular it's a super clear case of you know basically paying attention to the meter and then restoring a slightly older phonological sequence and so the change that happened obviously was the gliding of the high vowel and this seems to have introduced a new accent contrast to the language because the pitch fall was then phonologized apparently as falling pitch so Suvar you know had that was at a stage where there was just one high pitch accent and syllable and then and then you say swar and there's falling pitch still on what's now the only syllable and that gets phonologized as the accent that we write or transliterate with the graph I hope that was clear so that was an example where where we we know and have known for a long time that that we need to reconstruct an extra syllable and now comes an example where we need to reconstruct something like that in just about syllable weight not count and so this is another just you know classic textbook example so there's there's a root mirr which means something like be compassionate take pity and wherever the root is followed by a vowel that is to say the syllable as spelled the first syllable consists of M and a syllabic R the rig data the meeting you know paying attention to the meter meter the rig data shows that the first syllable must have been heavy right and so normally mirr would be a light syllable that consists of just a short vowel counts as a short vowel but it appears to either have been a long vowel or a vowel closed by a consonant and you can tell this again by looking at forms such as the imperative murder daya which means take pity and the poets regularly you know place this at the end of eight syllable verses where we actually expect then the done so we expect something like murder daya or and and but you know so and so we in this case so this is just you know sort of again like alerting us to the need for some reconstruction and then we'll actually use the comparative net well internal reconstruction and the comparative method as always to insert the right form back into the text and here the fact that the D slash L is retroflex plus the Iranian evidence where you find you know murder with a consonant between the syllabic R and the D and a best and is going to motivate us to reconstruct something between murder and murdered for for the rig data whatever that was certainly it was it still counted as a heavy syllable then okay and then so those are pretty exceptionless cases and then there are cases where you actually have very good evidence for variation so and and and this seems to have been sort up to the poet to use whichever form you know he wanted to and I don't think that this variation has been very closely studied so there may be more factors than just metrical factors going on here but it it looks pretty pretty metrically determined to me and so a well-known case is the the date of ablative plural suffix yes which can be realized either be us or yes wherever it's that it comes after a heavy syllable and that's just up to you as the poet and then there's the older form of the genitive plural ending either um or um that's very hard to tell and the younger form um so it's only transmitted as um but we see that essentially the poet got to choose whether he wanted to use monosyllabic um or disyllabic older um when we're diversifying stuff and so for example if you want to close an eight syllable verse or 12 syllable verse and remember those have a diambic cadence they go but then but then then you will say to mean of the people and if you are composing an 11 syllable verse then you're going to say John on as it gives you the right rhythm so and so I guess many into your penis would reconstruct this genitive ending is all home or something like that and I think pretty much everybody agrees that it sort of survives into into both Vedic and Avestan as um and then there's the younger form with with the very unsurprising looking contraction and another reason to assume that it was originally disyllabic is the the accent of the Greek genitive ending own which is circumflex and should probably be derived from a two vowel sequence where you have all own so I think that's the standard line on that okay so so everyone agrees on on the examples that I've that I've shown you so far I think and they people have studied this very closely including a lot of people in the mid and late 19th century and early 20th century Hermann Odenbach and Yvonne and Arnold being two of the most prominent and many of those were then adopted into a metrically restored that is to say reconstructed text that was published in 1994 and so so is there is there more to do and I would say yes I think we can we can do more and also should want to do more because we are now in a position I think to be able to say relatively decisive things about forms that are much less frequently attested or and or occur in parts of the verse that are not as strictly regulated so basically we can you know we thanks to some advances and and statistics and and and stuff like that we can sort of be a little bit more sure about things than say well then then 1920 early 20th century scholars who just didn't have the same sort of tools that we do now so here's here's an example a verb form ishia that's that means something like I I could be Lord I could be master and so that has the up so-called optative suffix e so that's sort of an irrealist type form followed by the ending for the first person singular in the middle the middle voice which is just here and so the form is only attested three times in the rake data and here you see at the end of the of an eight-silver line where you expect ishia not she so this is a but but if the other if you look at the other two attestations there in the first half of an eight-silver verse where the meter is much less strictly regulated and so they you know they're not people would not normally feel these to contribute much or any evidence one way or the other so so what do we so what do we do right we say well it is in the cadence that one time so I don't know how often do you expect that sort of thing to happen it's not it's unclear and so I think the right thing to do if you aren't in a position to sort of do some careful statistics is just to say well I'm not sure it might just be one of the departures from the meter I don't want to make make too much of it and that's sort of what what Oldenberg did and then others you know felt freer and they just did whatever they wanted but Oldenberg was very a very sober worker so okay so what we're going to do now is first note that though that the look the localization of a word is partly phonologically determined in just by the nature of metrical composition and then we're going to compare the way that poets localize a particular word with the way that they localize all of the other words that have the same shape and when I say shape I mean phonological properties that matter for the meter so number of syllables syllable weight distribution and note that we also have to pay attention to the onset of the word and the rhyme or the end of the word just because of the way recelabification works across word boundaries and they so so if you look at this this word would be Lord but so there's that's an example of right word recelabification of the end and then here's an example of left word where my hip hip then being closed and counting us as a heavy syllable right so these are the things we have to pay attention to syllable count syllable weight template if you will of the word and then some things about its edges and so then here's the way that I'll be representing this way typically depends on what follows and so that's why words common words frequent words that belong to that shape class or purush to tell much praised paravati in the distance the tana y'all put imperative and so as you would probably have guessed where the poets like to put this sort of a shape is verse finally in a syllable verse where it gives a nice diet iambic rhythm purush to tell paravati and so forth and so what we'll do is we'll just we'll just you know look at all of them and then we will express the pattern the localization pattern of that class as a vector right and so this just means that there are six that they put it starting in the first position of an eighth syllable verse so the verse would start with purush to tell you see that you get a one in the first position right you see that five more times you get to sit zero time starting the second position one starting in the third zero time starting in the fourth and then starting in the fifth which is the latest you can put that in the verse purush to tell five six seven eight right that's where almost all of them they put almost all of them when they're composing an eighth syllable so okay so then in 11th syllable in this particular shape class we see that they localized most of them the beginning of the verse 17 of whatever that is 26 the post is their old position is starting in the fifth position is another spot they'll put them and then in 12th syllable looks like eight syllable they just put them line finally because the cadence there is likewise did in did so so okay so and then we just put the three vectors together into one long vector and so now we have captured sort of as a mathematical object or whatever so what you might think of is like the metrical fingerprint of this shape class so now we're in a position to compare individual items to the entire class and so that's what that's what we're going to do thank you so so and and I so I and I don't want to suggest that you know formerly people were doing bad work in this area and now finally I'm doing really good work in this area that's not true but but the you know that there are some some advantages that I think we can you know point point to and one is that we're just including a lot more information now we're not just looking at the cadence of the verse we're looking at the entire evidence and even though the earlier parts of the verse are not as strictly regulated they are regulated and so these are informative things that we're adding to the picture we're also taking account the relative frequency of a class in the three verse line types so for example the type that we just looked at the poets like to use that better in 8 and 12 syllable verse than they do in 11 syllable verse you can kind of come up with an expected frequency in each verse type just generated by the relative size of those three subcorpore and you see that they're either avoid they're avoiding them in 11 and or preferring them in 8 and 12 and so that gets sort of captured here as well and we can work with were shapes that just aren't fit for the cadence so they only are localized and in less regulated parts of the verse and if we do the math correctly then we're going to be treating the infrequently attested items exactly you know very exactly and we can even say things about things that are tested three or two or one one okay so I think I should well I think maybe okay so we could say something like well what's the probability that the Tatana expressed as a vector belongs to the class of you know light heavy light x items that are shaped the same way minus the Tatana and and so well we'll get a probability value and those will all be very small so we'll just take the log of them so that they're easier to work with and you don't have things like you know zero point zero zero zero one three four five or something like that and then yes and so obviously when you see something like negative 32.5 that doesn't mean anything until you put it relative to the other log probability values that you're coming up with and so just to give you a sense of this the the the log probability values for the individual forms of the rig data range from negative 918 that's at least probable to belong to its class to negative 30 which is you know most probable to belong to its class and for each class we can also come up with an average so the class the average for this class is negative 31.5 so very similar to the negative 32.5 that we have for the Tatana now classes depending on the shape and and other because of differences in shape and so forth the their averages are very different from each other so they range from negative 250 roughly to negative 30 and that's because what I refer to as with height and loose classes so the one that we just looked at is tight in the sense that there are only a few places where the poets can sort of fit them into the or do fit them into the verse and and something like CVC shaped mono syllables that's with a short vowel they put those in almost any position in the verse and so it's in these sort of loose classes that you really get to see the other things that are sort of determining word order and the rig Vita like like syntax so okay so all right so so that here I'm just reminding you of Isha and notice also Rossia is another like this where you would expect Rossia I would give over and so this will be our very brief case study so we'll just look at all of the optative forms of the first person optiforms it in and you can see they're just 15 of them and so 15 tokens and one two three four five six seven types so it's very this is a not not a well-attested class and so for each form we're going to make two comparisons we're going to compare the form with its shape mates and then we're going to compare the form with its putative shape mates that is to say or for example we're going to compare Asia with other things that are shaped like Asia is spelled and then we're going to compare Asia with everything that's shaped like Asia the form that we're considering reconstructing are restoring and then we're going to say well is it more probable that Asia belongs to its apparent class or is it more probable that Asia actually belongs to the class of things that are shaped Asia and so so here's so we I'm leaving Buxia aside for a second we we do find a kind of a distribution something it looks like it might be a distribution here so for Asia and Russia those two seem you know seem to have with a short eye that is to say something to reconstruct with the other ones you seem to have a long eye which is the classical Sanskrit form as well and the one that's transmitted in the text and it also seems to be the case that you get the sort of reconstructed form after a heavy syllable and after a light one that doesn't look crazy in a language that has a fair amount of morphophonology that promotes syllable weight alternation and is also in an environment like Iambic verse that promotes syllable weight alternation so and Buxia so but by this distribution we would expect you know Buxia with a short E because Bux is a heavy syllable but we find better evidence for Buxia with the form that we expect and so this is as far as this takes us it sort of suggests I would say the method suggests that we take the reconstruction of E with a short I as at least a variant pretty seriously and then now we just go back to doing what we always do which is internal reconstruction and external reconstruction and what I what I suppose here's a scenario that seems plausible to me I think we know what the etymology of the sequence is it's and so I guess that may have just that yeah with a short I may have been the relative the regular outcome of that that's also what Brogman thought but he didn't know about the two laryngeal so and and then you get Ea changed to Ea by by analogy and there are various ways to do this but if you wanted to do it as you know four-part analogy I guess you would say it's something like a top which is by the way this is a long-dowl always a top is Ea as E top is the X and X you would solve for Ea and you could also do it differently if you like and so that seems like a plausible source of the the younger form and then what we would have is another sort of a poet's choice situation where you have an older Ea that's still around that the poet's up to use especially after heavy syllables because they're composing in weight alternating meter and then you and then and then you also have the younger form that we know from classes as Ea so thank you for your attention that's that's it in short and I'd like just to remind you I am extremely interested if you can think of other places where we could sort of try this out and mainly we I think just need some sort of text which is in like the organization of which is fairly phonological obviously not completely phonological and where we need to figure things out about the phonology thank you