 My name is James. I am a former student of Nathan's at Sauce where I just finished my MA. I will soon become a research assistant on the Han project. And during my time as an MA student, and also throughout my dissertation project, I worked on the project that I will present today, which is she's an imputation. So, in a broad sense, this is basically a method for filling in rhyme gaps when they're not attested in the actual poetic corpus, using a sort of naive edge filling method based on phonetic series, which I'll explain later. Basically, for this presentation we're going to start just by broadly talking about what has been done for in a similar area or with applications to this corpus or similar corpora. I will then introduce the idea of applying graph theory to rhymes and what a rhyme network actually turns out to be. I will then describe my method for why I or my method for measuring cluster quality which turns out to be kind of the most important quantitative metric for studying the, I guess, quality of rhyme annotations that I'm comparing in this particular project. Then I will describe the station hypothesis although I'm sure probably most of the people in the audience already know what that is. And how exactly it applies to the construction of rhyme networks. I'll then introduce with particular view to view their final characters what that actually ends up looking like. And what kinds of new things are result from comparing these different rhyme annotations so in this instance, I've mainly compared the rhyme annotations of William Baxter and Wang Li Baxter from his 1992 book, and Wang Li from his 1980 book. And for reconstructions. I have used both Wang Li's reconstructions from his simultaneous 1980 publication, and also the new reconstruction of old Chinese from Baxter and cigar with in some cases when a character did not, or was not reconstructed in the to jump jump on with the necessary sort of mutations made in order to make it Baxter cigar compliant. And then that'll be it so for our sources. My project was entirely based on this regime. So I didn't at this point, consult any other old Chinese corpora. And like I said before, I used rhyme annotations by Wang Li and William Baxter. And then the reconstructions primarily for the Baxter rhyme annotations came from the newest publication or came from OCNR which is old Chinese and new reconstruction Baxter and cigar 2014. And then, if there was a character that was not reconstructed in Baxter and cigar 2014, then I would mutate a character from John John Chong from 2003 book in order to make it Baxter and cigar compliant. The reconstructions are actually complete every rhyming character in the regime has been reconstructed by Wang Li, not necessarily with, you know, complete reliability, but it is enough to create a fully, a fully colored rhyme graph and you'll see what that means later on. Here's an example. In case you're not familiar with these two reconstruction paradigms this is an example of what some different or actually, you know, sets of many of the same characters are reconstructed and are and how they reconstructed differently in these two systems. So, more or less in the first instance the strings I'm working with are the reconstructions that are shown here on the right hand side of either column of characters. This isn't that important, but it's just sort of to guide you in the direction of knowing what the different reconstructions will look like. I think that's just something helpful to remember going forward. The different structures. Originally these are all just text based CSV files but you can transform them into more machine or more human readable formats, but as a general principle the data structure and its base looks more or less like this. Before I did any transformations or extracted anything from it. So you see a poem title here, a stanza, these are all again from Mr. Jing. Rhyme word rhyme and reconstruction are the main things that this that my analysis would focus on even rhyme actually isn't that important. But rhyme word just indicates the character that rhymes in the line. And then reconstruction is the reconstruction of that rhyme word. In this case this is a Wang Li set. These sets have been compiled primarily by the team out of Max Plan Institute. So I have analyzed these data but I should specifically say that I can't take credit for compiling them and that's a much more sort of tedious job so I appreciate someone else doing that. So, that brings us I guess to the question of what actually is a network. So if you've read any of the papers that Nathan has been working on for the last couple of years or particularly the on my list has been working on then you'll know but otherwise, a network is an abstract mathematical objects or a technically a graph is an abstract mathematical object that represents entities and relationships between them. And these relationships are also abstractions. So, you know, it doesn't, it doesn't assert anything about what those specific relationship is like, all you're saying is a is connected to be in some meaningful way. So, a is a node B is a note I'm connecting a and B, because I assert that they are related in some way. So much of the time, these have sort of baked in relationships so you can most commonly create graphs of flight paths between airports or social networks based on friend or follower relationships on social media or something. But in, in poetry, you can do the same thing by connecting characters that rhyme with each other. And so that's effectively what a rhyme network is. So, in the case of a rhyme network you just create an abstraction over the corpus and then you can slice it to get a finer grain view of whatever it is that you're interested in looking at. Here's a more concrete example. This is the first stanza of the first poem of the shooting. We have, and this is what a self contained rhyme network would look like for this poem. So, we see here that the characters at the ends of lines one two and four all rhyme with each other. So, it creates this triangular network where each character is connected to all the other ones in graphs theoretic terms that means it's fully connected where every node connects to every other possible node. And then in this case, this one goes out on its own in the networks that I built, I would generally ignore singleton characters unless they happen to rhyme with themselves in a poem. So, not that they appeared by themselves but that they appeared in rhyming position only with themselves in which case they would be put out to pasture kind of like this. But in cases where they only appeared once. In other words, I guess kind of a Hapax in the Hapax legumina of the Shardim I would algorithmically ignore them. Just because it cleans up the graph quite a lot. And then for the sake of measuring the quality of partitions. In other words, when you create a rhyme network you expect characters to group together in some meaningful way. So we hope that characters will group together by their first by their finals and by their nuclear valves. So in other words characters that have the same final and nuclear valve will create more well defined clusters. And that the number of vowels in a reconstruction, more or less should correspond to the number of clusters that you find if you slice a graph by its final, or by the final of the characters or by characters who have a final in the graph that you're looking at. So to do this we use a metric called modularity modularity is a statistical measure of cluster quality. And it is just the difference between the actual number of edges versus the expected edges in a predefined sub graph more specifically if you take a graph and you cut out a piece of it. It measures the number of edges within the graph compared to the number of edges leaving, excuse me, within the sub graph compared to the number of edges leaving the sub graph. And then subtracts that from the expected number in the same way that you would derive an expectation for a high square test or something. So this is the formula. I could tell you what all the pieces mean, but it's that level of granularity I don't think is that important. So, there are some issues with using this metric that I definitely should declare ahead of time. So one of them is that calculating maximum modularity for a graph is NP hard in practical terms if you don't have a, you know, if you don't have like an intermediate CS background or something. If something is NP hard that effectively means that there's no kind of algorithmic shortcut for solving this problem if you want more or less if you want to calculate maximum modularity, you have to calculate every possible partition of a graph. And then measure that and sort it in the long list of partitions. So, in a partition, for example, if we go back to this one, one such partition might be only the blue one and then the three orange ones okay that's fine but then it could also be considering all four nodes as entirely separate. And then measuring modularity for that or considering two orange nodes as one partition and then the blue and the remaining orange node as another partition or separating the other blue and the orange notes so you can see that as you add more nodes to a graph. So one of partitions increases, not quite exponentially, but it goes up really fast. So, if you have three, then you have six possible partitions if you have 100 nodes then you have 98 million partitions, approximately, and then beyond that is even more. So any algorithm that calculates modularity doesn't calculate definitely the maximum modularity it uses some kind of heuristic to start calculating modularities and then walk back when it feels like it's not getting anything more out of it. Which is the case with the algorithm that I decided to use which is the close a new one more algorithm. Effectively, this one works by partitioning the graph. First, with all of its nodes being separate communities and then merging them together piece by piece, until it arrives at some kind of stable convergence of maximum modularity. But because it starts with a high number of communities and then shrinks that number. If, for example, you know you have say 200 nodes in a graph. And it finds maybe 10 or 12 communities and then it goes to nine and eight and seven and it doesn't see the maximum modularity going up then it will walk back. But it may be the case still that maybe six or five communities is a better partition. If you have an a priori reason for partitioning the graph that way. On the other hand, other modularity calculations are significantly more computationally intensive. And so, this is generally considered kind of the current standard. With respect to the station hypothesis, I'm guessing most if not all the people here know what that is but if you don't Chinese characters are composed of or compound Chinese characters are composed of phonetic and semantic radicals. You can sort characters by their phonetic radicals. And later on, you know, through the development of first from Carlgren and then from leaf on quay, we have these things called phonetic series where characters are put in groups based on the combination of their phonetic radical and at this point to also the place of articulation of their initial consonants, or, you know, if they don't also if they don't have one. So, the station hypothesis in is most commonly articulated by this quote from leaf on quays 1975 book which is the same phonetic determiner in the writing of two characters implies the words expressed by these characters have the same rhyme category in the codes. In other words, even if we don't have an attestation of two characters rhyming if they did occur together in rhyming position in an old Chinese corpus, they would have run. So if we can encode choosing relationships as rhymes we can draw edges between nodes that do not have edges if we only draw rhyme network of the corpus, and this will hopefully resolve our singleton node problem to some extent. So then the question, I suppose is does it. So in order to do that we also have to make a couple of assumptions. The first one is the ability for two old Chinese characters to rhyme is correlated with both the nuclear vowel in the final. This, I suppose make sense but it should still be clarified as an assumption since ultimately a rhyme is not a fixed quality of language but a sort of arbitrary linguistic feature. And based on the station hypothesis is that characters within the same station series will share their nuclear vowel, including cases of phonophoric nesting so in a station series you might find that the base character generates a character and then that character is then used as the phonetic determiner for another character. Which you can see here, where we have the character in the center of this spoke on the left, being used for as the as the phonetic determiner for the character in the, as, which is the hub for the spokes on the right. The character is then used as the phonetic determiner for all of the characters on the outer brain. So what we're assuming in this case is naively, but for statistical purposes effectively that all of these would have rhymed with each other if they had occurred in rhyming position in an old Chinese corporate. So, and then we have this last one, which is when a poet is forced to slant rhyme sharing vowels is preferable to finals although this doesn't end up affecting the structure of the rhyme network very much. Because we don't end up comparing different finals. We only end up comparing finals and then looking at the distribution of vowels within slices final character final slices of the graphs. And then measuring the modularity of vowel subgraphs within the final slice subgraphs of what you can think of as an abstract graph of the whole point of purpose. So how actually did we do this well in our original data structure we have romanized reconstructions. So we can extract the nuclear vowel from these reconstructions using a regular expression. You. So if in one of these reconstructions for example, we do find quite often that a reconstruction will end with a letter I or a letter you and so that has to be first transformed into a j or w. Then the string is reversed and the first vowel that we come across we consider to be the nuclear vowel. Same thing with backs and cigar reconstructions, although those have a lot more white space sort of superfluous white space so it's a little bit more complex. But if you again reverse those strings take out all the white space ignore the super segmentals then in theory the first value come across should be the nuclear value. So, if you've managed to do this, you can just create or we created another column that represents the vowel for the character. We created dictionaries digital dictionaries as an associated arrays not, you know, lexicographic dictionaries that mapped characters to their reconstructions, and also characters to their vowels for each of the one way and the backster slash reconstructions and rhyme annotations. Then, following that we build a large rhyme network from all the station series irrespective of any corpus data. We filter the rhyme annotations to draw a network only of characters with the same reconstructed coda. Then, following that we overlay the two graphs the station graph and the final filter graph. We can remove any characters that are part of season series but that are not part of the original code of filtered character set, and then do the evaluation metrics. That was a lot kind of fast so does anyone have any questions at this point. No, okay. So as an example, as just a proof of concept we'll start with final M because final M is pretty rare in the corpus and so it generates a fairly small graph that's easy to sort of look at the value. So here's OCNR final M characters. You can see that floating around the edge are lots of pairs or small groups that don't connect to the main group very much and while this is good for showing us which characters are actually tested to rhyme with each other. So this is not that useful in showing us what exactly the structure of the rhymes in a fully connected world would be like. And then this is the structure of our sort of final graphs. So in this case, the nodes are colored by their core vowel. I have the key somewhere, but I don't think it's that important right now. The thing is, as you can see that this central that a couple of bigger hubs are surrounded by lots of small pairs that should be connected to the main graph hopefully showing us that final coders actually do cluster together but at the moment they're not. So this is a graph of all straight young series, more or less. You can't zoom in closely enough to pick one out because the basically that it's too dense you can't. I couldn't show you a picture of the whole set and also tease out individual nodes, although in some cases you can see singletons kind of floating around the outside. This mess actually will not affect the final graph very much because in the next step when we combine these two graphs we will then prune all of the nodes in this graph that are not part of the this original set of nodes. And we arrive here, which is much better. So no more floating nodes, everything has been connected to something else by way of an edge. In a couple of cases we have a node represented by a kind of singleton vowel, a hapex vowel in the corpus, which can't really be helped. So it's not going to connect to its partners. It's not part of chasing series that would have connected it to any of these but in the end, ultimately that that's also fine because we've massively clean up this graph. In a graph that's this small and kind of easy to look at calculating modularity I don't think is that helpful because it's quite easy I think to just kind of. This is the eye test more or less for showing that indeed. Val plus final combinations do tend to cluster together. And it doesn't show us anything particularly interesting about that, I don't think. On the other hand, if we set these two reconstructions against each other and look at final groups that are larger, particularly the final and final case with the final view nasal and the final view or stop. So when we assign the additional attribute to be vowel, then we might actually see some interesting information that would allow us to evaluate the quality of these two reconstructions, whether wangli or OCNR Baxter and cigar. So, initially, what we expected was to be able to combine a graph of wangli's rhyme annotations with the Baxter and cigar vowel reconstructions, overlay them with each other and say oh look it's, you know, worse or better or something like that. Just based on how it looks. I think that this actually doesn't work, because while we strategically chose two reconstructions that are pretty much as dramatically different from each other as possible. That also means that the rhyme annotations of these two are also significantly different, significantly different enough that overlaying the vowels on to the rhyme annotations creates chaos rather than order. It's most apparent with the final graph with the final stop graph. They're a little more similar but then still not similar enough I think to do direct comparison. It also raises quite a few complications in terms of evaluation, because if you go into it. If you go into the analysis imagining that the reconstructions of the ladder day are going to be somehow better. So it turns out actually that you need quite a lot of quantitative evidence for that. So, this is OCN ours. Unimputed graph of final. As you can see, there are quite a few small clusters around the outside which is an undesirable characteristic at the same time we have larger clusters in a lot of places. This is mainly due to the fact that a lot of characters have final in the corpus. We also see in this case quite interestingly them. A lot of mixture between these lavender and orange nodes, which is going to be kind of a repeating theme in that what we've stumbled on in this case is is some kind of. Whether it's, for example, a regional or a diaconic difference but it is quite often the case that in these graphs you see hard partitions between certain valve groups and softer partitions between other valve groups. So in this case, the lavender and the orange represent. I think it's quite an Aang characters, I believe, as reconstructed. And it's quite intuitive, I think, not surprising that in many cases in Aang would be rhymeable particularly if a poet puts himself in a situation where he is only really able to use an Aang character to rhyme with an in character or something like that it is understandable that it's a reasonable thing to do. But in this case we do see that it is something that did happen very frequently. Now in order to fix these outlying characters, we again impute. And it turns out that it actually makes a lot better. So if you look at this graph before if you look kind of, I don't know if you can see my cursor or not but if you look in the sort of center to top right you see the one pink node. It sits among all of the blue ones. And although it's not a significantly undesirable characteristic for it to be sat among the blue nodes and separated from the pink ones, ideally we would be able to connect it to the rest of its true vowel neighbors and indeed that does happen. In fact it creates a bridge between the pink and blue notes. The blue nodes in the top center that is connected to the yellow nodes. It doesn't join with the rest of them because it belongs to a series that otherwise does not contain final ng. We have a similar sort of situation with these two orange ones where they are reconstructed in OCNR as final ng, but the rest of their series have final ng. So they do not connect with the rest of the orange nodes, but having said that if you look closely you'll see that this character shares a phonetic determiner with this one. And so it is, it's actually, you know with with a bit of qualitative appraisal. It is, it becomes clear that an edge drawn between for example this character and this one is entirely a reasonable thing to do, just not something that the algorithm would be able to do heuristically. Now, on the other hand, this is a pretty good partition of this graph. At this point, I think I should clarify that modularity requires two things to be set before you calculate it so one of them is the structure of the graph is preset so nodes irrespective of any attributes. Doesn't matter what the character is doesn't matter what the vowel is. So modularity is entirely agnostic to any attributes about the nodes it is it only considers the graph as an abstract mathematical object. And so, if you were to measure the modularity of this graph it is only it is solely dependent on the structure and then the communities that you pre define if you want to calculate it. In this case you would have to say, or you would have to put all of the blue nodes in a set all of the pink nodes in a set lavender orange cream yellow. And then say, these are the communities calculate modularity. And then it will do the calculation for you. You could do it by hand, but it would take an extremely long time. So, then, from ocnr we get to wangli. And one of the things I think that's striking about this is the, the amount of actually internal structure that it has compared to ocnr. So, it turns out actually that one of these reconstructions were almost certainly already sensitive to she's on relationships because if you impute or don't impute they are virtually the same, including the fact that there are no singletons. All it does is fill in additional edges within clusters. So, we have only the wangli imputed graph. And here we see for quite neatly defined type clusters which correspond to the four vowels that one of these reconstructions believe can proceed final. But there are still problems with that. One of those problems is the wangli reconstruction is typologically odd from a, you know, a cross linguistic perspective. Whereas the Baxian cigar reconstructions are definitely not. So, in the wangli reconstructions, we find that of that there are four vowels of wangli seven reconstructed vowels that can proceed final and those being all and A, but from those four he excludes all. And there is not actually a consistent phonological rule based on the rest of his reconstruction that would seem to generate this possibility. But at the same time, the amount of structure that is contained within his rhyme graph suggests that there is some value to the, the amount that he, there is some value to his rhyme annotations. Because this is clearly a good structured graph. So let's do a bit of a problem, because with any of these things when we're comparing someone's rhyme annotations to their reconstructions. It's possible to fall into a trap of circular reasoning where a particular reconstruction will subsequently inform rhyme annotations and then rhyme annotations will then begin to inform reconstructions until we arrive at some sort of convergence. So we actually can also quantitatively appraise the quality of these partitions. So one thing about both of these is modularity can also punish graphs for not having enough structure within their clusters. And, in fact, that is something that happens to Wang Lee. So, if we measure cluster quality for these two graphs, we find that OCN ours maximum modularity heuristically is calculated at point 494 but the even pre imputed. OCN our final graph arrives at a much better modularity than what could have been, or then then what the algorithm could provide. And furthermore, by with the addition of she's young edges, the modularity goes up even more which is a good result. If she's young edges improve your modularity that means that you've probably done something right and is it is one of the things that we were initially setting out to to test. So, where does that leave us point five three six and point five four two. Wang Lee's partition, on the other hand, is a little bit worse than maximum modularity, but to an extent that is effectively to to an extent that is effectively unimportant. And by adding she's young edges. His actually goes up a lot more. So, when I mentioned before that the addition of teaching edges doesn't draw new edges between singleton characters for example but it does increase the number of connections within structures. And this is really important because what you actually find is that modularity will punish hub and spoke network structures, compared to more tightly connected graphs. And so with the addition of station edges, by creating more tightly connected clusters in each of these vowel groups. We see one these modularity for his vowel groups significantly increased. If we move on to final K with this one. It is actually this is just the inner part of the final K graph but it would have been impossible to show you any kind of structure because there were too many singletons and pairs floating around in a ring on the outside. I will also mention at this point for those of you who are interested. When I draw these graphs I use the spring layout. It's called the spring loud because the edges between nodes, in this case are supposed to act like springs, which creates a, which creates tension between nodes and the more cluster of nodes that is behind an edge, the tighter the distance will be. And if a node is not connected to any other node it will try to maximize its distance away from those nodes until an edge is drawn between them, in which case it will pull it closer. There's a little more to it than that but that's sort of a broadly impressionistic description of how it works. And with OCNR final K, there's a big outer cloud of characters that are not visible in this picture that need to be imputed because it's really, that's really undesirable. When you do impute most of them go away but you're still left with quite a few along the outside. The internal structure also becomes tighter. In this case we also see a few green characters here, these are characters that neither Baxter nor Tung-Tung-Shang-Fang reconstructed and so we didn't have a commutable source to reconstruct for them. There may be something to do maybe in the future, but for the moment it's okay because if we want to calculate modularity it's acceptable to just ignore them and calculate modularity for the graph that does not contain the unreconstructed nodes. If we move on to Wang Li's final K, there in Wang Li's case with final K, it is possible to proceed K with seven vowels, rather than just four which is why we see seven colors here. And they are quite substantially mixed together in a way that is quite, in a way that is undesirable compared to final NG. So with the final K characters this throws the internal structure evinced by this kind of very visually impressive network into a question. We then combine Wang Li's final K unimputed graph with Xie Xiang, however, we get something that is much more structured. So, I've already explained a lot about graphs so I won't dwell too much on this but we do find a couple of interesting things, one of which is this group at the bottom the blue and the lavender. And we see that in Wang Li's reconstructions these are the Ock and Ock finals cluster quite tightly together in a way that would seem to suggest that these were in fact a, a combined rhyming group. Now, because this graph, and this graph, at least with these central parts are closer in terms of vowel association compared to the NG graphs, what we do find if we combine Wang Li's structure with Baxter's vowels is something like this. So in this case, again, the, the green characters do not have reconstructions in Cheng Cheng Cheng Feng or in Baxter and cigar and so when we calculate modularity we can ignore them but for now it's important to pay attention to the characters that are not green. And what we see in this case is the combination of this group down here which splits the lavender group in half. The reason for that is Baxter and cigar have reconstructed characters that that have an intermediate glide between the vowel and the final case so instead of having just Ock they have Ock and out. These reconstructions would not have been sensitive to the regular expression or the distinction between these reconstructions would not have been sensitive to the regular expression that was supposed to extract the nuclear vowel. So both of them would have been considered a K finals. But in fact they are substantially different in terms of the rhyme structure because what we find is a general pattern, for example, in these for the characters to rhyme with the Ock characters and not for our characters to run with our characters. So that was probably the most kind of the most scientific, not scientific the most sort of Eureka moment of these rhyme network drawings but at the same time. So the discovery of only one additional rhyme category. Rhyme category correspondence, I guess you might call it. And so I'm surely there are others, but we would need to do more final slices and more reconstructions, I think, in order to figure out where all of these might be. So if we look at cluster quality for final K, what we see is that one least partition still outperforms OCNR by quite a lot, although the graph that is generated by OCNR has a much higher maximum modularity. But this is generated with, I believe, 15 communities, which makes it not quite as valuable this is probably a statistical artifact. So if you really measure the vowels that are contained within OCNR, excluding the characters that do not have reconstructions, you only get up 2.647, including with station. Now station makes more of a difference to OCNR than it does to wangli in this case. This is included or wangli's modularity is increased by 0.08, whereas OCNR is included by 0.16. And the maximum modularity of the wangli graph is also a little bit lower but you get a number from the number of communities that is heuristically calculated that's a little bit closer. But in that case it was eight or it was nine communities because it tends to be the case that these graphs also partition, for example, these communities or partitions these clusters that are extensions of communities as separate not as belonging to their own vowel groups. And here is the, here are some annotations of what exactly or which vowels correspond to which groups on the wangli structure overlaid with the Baxter and cigar reconstructions. So we see that OCNR mixes elk and ilk here with this one yellow character, combined with the orange ones, the rest of the characters are in this arc at the top, the yellow arc at the top. So elk and elk combined here where elk characters in wangli often correspond to elk characters in OCNR and those combined with elk in OCNR. So, interestingly, there's an additional run correspondence between elk and elk in OCNR or elk and elk also in wangli, which is distinct from elk. So probably the next thing to investigate as far as an additional rhyme correspondence is concerned. So what do we found well, among other things we found that it is difficult to compare rhyme annotations when they're substantially different from each other. What we have found is an strong evidence for an additional rhyme preference in final auxilibals that they may not correspond most strongly with the syllables that share their vowel but instead with a different category. The evidence is mixed overall for which reconstruction system is better if we take the rhyme annotations to be separate from the reconstructions then the evidence for wangli is a little bit stronger but at the same time, as far as evidence for interpretations, if we account for typological evidence and other things, or other phonological characteristics then it certainly seems to be the case that OCNR is closer at least to a natural language. So, you know, it might not be correspondent to old Chinese. And then some back to scar characters particularly in final K do not merge with major clusters even after imputation. And this may be a call to do a few different things. So one of them is lack of corpus data, which is an easy problem to solve, but tedious because it just needs. It means that we need to code for example the Chiu or a similar corpus into machine readable format. We may need to revisit the state of reconstruction for those characters specifically. Or we need to revisit the state of station series and see if there is something else that needs to be done in order to combine characters into series that may not currently be in series or split series that maybe currently together in order to bring characters that currently float along the edge into the fold. And that is what I have to say. So, yeah, I think it's time for questions. We have eight minutes. You do find a few characters, particularly in Baxter 1992 where if you look at a certain poem where Wang Lee and Baxter disagree. It seems to be the case that Wang Lee's reconstructions are trying to fit into more of a rhyme paradigm. Baxter will create the reconstructions and then assign the rhymes, but at the same time if you look at which reconstructions are supposed to rhyme with each other. Sometimes, honestly, they don't make a lot of sense. If I can paraphrase that broadly, it seems like Baxter, his reconstruction is more motivated by the internal reconstruction of Middle Chinese that then he applies to the analysis of rhyme and the surging. Whereas Wang Lee was using a more circular approach, or let's say spiral approach to make it not sound insulting, where he sort of has a reconstruction idea and then he does rhyme annotation, then he changes his reconstruction, then he changes it so that there's this kind of convergence between the two. I think doing something like that, unless somehow he managed to stumble on a perfect rhyme annotation of the NG finals, is the only way you could arrive at a network that is as sort of visually coherent as the Wang Lee NG final graph. Do I understand correctly that if you use Wang Lee's rhyme analysis for the K finals, you get more structure in this like WK area than his reconstruction predicts. Yes, that's correct. Yeah. So if you overlay the Baxter vowels with the Wang Lee rhyme structure, you get a better partition than either of them generated independently. To some extent they're off their own worlds with NG where you can say like well on the one hand this one's better in this way and this one's better in this way, but at least in terms of dividing certain vowels before WK, Baxter and Cigar's proposals are improvements even by Wang Lee standards. Yeah, and you can see that I mean I, you know I can show you the graphs again if you want but you can see that if you look at a Wang Lee graph because very often what you find is a single colored vowel group that has a cluster here and then a bridge and then a cluster here that are all one color. Whereas if you overlay the Baxter and Cigar vowels onto that those are then split. Whereas Baxter has a different rhyme annotation, this is kind of an independent paraphire. For those of us in the in the Baxter camp, that's pretty good ammunition. Yeah, it says that at least some of Baxter and Cigar's proposals do not come straight out of their heads, but are are are pointed at even by Wang Lee's analysis of the rhymes which sort of suggested they're you know part of reality, not part of, you know, imagination. You talk a lot about desirable desirable undesirable so so we're sort of wondering what are your desires. Yeah. And, and my sense is that it's that if we have an infinite supply of poetry, we would get as many clusters as distinct clusters as we have nuclear vowels. Yeah, and that they would be. Okay, you might have some little tenuous connections between them but, but basically the larger the data set, the more modularly distinct, you would get clusters and you would have as many as you have nuclear vowels is that a fair characterization of your desire. Yeah, so ideally, you know, if you have, if you slice a graph so that you're only looking at one final and you have seven vowels. Ideally, no node would remain unconnected to its home group. The amount that they are connected within themselves is, you know, is to a certain extent, variable, because the maximum modularity depends on other factors of the graph like how many nodes there are and how much on average each node connects to another node and stuff like that. While we're talking about undesirable you can also talk about undecidable because maybe a context free rhyme annotator cannot predict all possible rhymes. That's a Chomsky and joke for anyone who's out there. What you've just emphasized is that you hate singletons more than you like dense clusters. Yep. Okay, that's fine. That's, that's how we want reality to be if we have enough data, but you can get there just purely artificially right like like I could just, I could just take all these nodes and say, I hear they're all nice and partition that just have no correspondence with historical Chinese phenology and there's no reason that, for example, the extent to which Wong Lee's analysis gives him more desirable clusters by your measure that we don't know whether that's because he's better reflecting the truth or because he's a better magician. That's true. So for that I would say it would be better to take a look at as many as many finals as many final sliced graphs as possible and then I mean my my my instinct is at that point to philologically look at the bridges between clusters and see where those tend to happen. This is a place where adding other corpora would be important. We know this corpus is the one that was scrutinized while making the reconstruction. So we would anticipate that to the extent that he's a magician that analysis will fail on another corpus. Yeah, that's that's right. It's reality, it will work nicely on another corpus.