 So this is about a preliminary study about how to use network models to analyze all Chinese rhyme data and the idea was somehow, I mean, I think it's an obvious idea to use network models but I never had time to actually look into this but recently for this workshop I then realized that I had some pressure so I would just code the data and make some first applications and write some stuff and I hope I can show especially that computers can be useful for our research and historical linguistics in general but also for the specific cases even if we do not believe in what computational methods what they give us so we do not believe in any of them but they can be useful to assist hypothesis finding or hypothesis checking and I want to basically focus on that so but okay network models this is all about rhymes and networks and let's start with rhymes and I take the opportunity to take my favorite rhyme lose yourself in the music the moment you own it you better never let it go you can only get one shot do not miss your chance to blow this opportunity comes around to the lifetime so why do I take this because it's interesting regarding the rhyme analysis if I was now given an exam and the teacher would ask me like how do you give me the rhymes in this poem or in this whatever we call it then I want to give the following analysis I would say that music rhymes was on it go with blow shut with not the questions of course maybe teachers would criticize it at arriving with it but Germans would rhyme employ and deny the question is what is better I don't know but this is what I would get here now what about networks like this is the rights and I just wanted to take the opportunity to bring this up and networks is a simple data structure we can think of that as a data structure which has a node which represents an object and an edge which represents relations between objects we can tag or label nodes as shown here by using different colors so we can say this is the red now the blue now we give the names or we do something with the edge we can do similar things we can label them and rate them so this is bigger or thicker than the other one so we say like this is stronger connection a stronger relation between the objects we can also direct them but directed networks won't be used in this application because we don't have direct networks of rhymes as you would see later but this is just to illustrate it's actually really simple but the good thing about network is that there are many applications both of visualization was for analysis that we can use in order to get more once we have the network data now how do we get from rights to networks if we take a stanza from the shifting and remarked the right patterns as here in red and we say these this character right with this with this then we can just create simply say a character is a node and the connections are just when the words right so if they rhyme in a poem if you say that they right then we make a first connection in the network if we add more stanzas of the shooting to this and again like identify what rhymes are what not then we can make connections because in this case we have mostly character for city and we would then just say that these also like this occurs twice here and here and then we can connect them oh so this is the simple idea how to use the shooting how to make a network of the shooting wires so how to construct such a network and first I had to prepare the data and the study point was constructed at the right annotations given in a package of Baxter 92 the data was not digitally available so I transferred the annotations by Baxter to a digital version of this shooting I took project Gutenberg we can have all problems but I couldn't crawl the data from c-text off because they would block me and so I would just take this as an starting point then I would go through the Baxter's book and always annotate whenever something right the digital version was corrected during the process where so I would find certain things where project Gutenberg doesn't give the right character not always I was a bit I was a bit under time pressure so I was working every day half an hour in order to not get too bored of this but after one month then I would correct certain cases there and furthermore I had also a digital connection of most of the old Chinese reconstructions given in the new OCBS system provided by Le Bon and this is what I have now I organized the data insaturated as you can see here so first we have the poem and this is something that we know we have the stanza something that we also know 112 and something and then I annotated something which I call a section this is actually the things that are followed by a comma or a dot in the old Chinese text for the simple reason that these are the potential locations so the end of a section if we strike off things like prefix or like the things like the affixes where they sometimes add to the to the poems if we strike them off then these are the potential locations where we would try to write words so we have we could also use this to automatically try and detect what rhymes but I wouldn't wouldn't do that because I took the annotations so if a section contained the right word according to their Bexter's annotation this was noticed as such and if I detected further rhyme words I had reasons to disagree with Bexter's annotation this was noted in an alternative annotation for each section I tried to identify the old Chinese readings of the OCBS system but this was not possible in all cases there are some 400 readings are missing and I still did not have time to check them so the first step I then made was making a little act because nowadays everybody makes acts and I also want to make an app and I made this app which I called a actually I didn't give it a real name but you can access this here by this link and I will just show how it looks so as you can see here we have all the sections like for each word like where they occur like for the words we have them sorted here you can go for the hanze and then we have the pinning and then we have a gloss then with middle chinese where I had it I mean there are some things missing old Chinese Bexter's tagar then I have the kanbu and what I got from the from the metadata now the cool thing about this whole stuff is the following if I go for for example you can search I can search for a character like I need to switch to Chinese I want to see all the instances of war in the data and I find them here and now we can actually look in the poems where it occurs by clicking here and then you see the rhyme words are annotated here you see like here it occurs in this position it rhymes with and here are the reconstructions and here is the next stanza and I have this color schema in order to make it easier for the people to detect the rhymes I think this is already really useful at least for beginners so for me it was useful afterwards you see what happened because when looking at the poem usually it is difficult for the people who do not really know all the classic texts neither words or something to to identify the rhymes but this was the first step just to illustrate this now getting back to the now the network I was talking about how did I reconstruct it I took all characters which occur in the shiting in a position that was annotated as rhyming a contrary maximization they are the notes and the links between two characters are drawn whenever they are annotated as being rhyming in a given poem and the number of instances in which two characters rhyme in a separate stance sets were pounded and assigned as the edge rates of the network and note rates were derived from the number of times the rhyme words occurred in the shunting in a potential rhyming position the data was then further normalized we need to normalize when working with this kind of data by counting every pair of identical lines only ones identical sections only ones in order to avoid the phrases via too much rain we know that there are many phrases in the shiting which are repeated across poems and we shouldn't count them two times or three because then we think that these words are really strong collected but maybe it's just because the people would imitate the people who started to make it so in order to this is easy to do by just counting the ones so we count them once but with less what I was thinking and later I should further normalize is because if we have a poem or a stanza in which three words rhyme or with each other then I would make three links between a b a c and b c this may be a problem because we could assume that people who write a long stanza and want to have all words to rhyme they get sloppy they get tired and then they think okay this doesn't really rhyme but I still edit here because then it makes sense for this case and we can see this also in hip hop so in order so we should in order to account for that one should maybe divide by the number of a group by the size of the group but I didn't do that because I this was only reason that I detected this error but I think it is still useful so we have this certain bias but we can work with it and now regarding the analysis of the shooting at work let's first look at the birds I agree this is quite interesting in my opinion because we have almost a small world graph a small world graph is one where you can get from any note to any other note so we have this disconnected components many characters are completely but we have this large cluster of things that are all interconnected and we could now what do we do with such a network actually it's okay we could say okay well let's look let's zoom in zoom in zoom in then we can see whether we see something but it's not really interesting actually I realize that it's not really it's rather difficult to find anything here in structure so we need something more possibility since we have the labels and we can visualize them by using certain labels and giving them colors we could say let's take the five balls of all chinese or the six and let's color it according to and what you can see is that we see some structure emerge we see that usually it's so but when we see something that this network has an internal structure so it is not that this network is something is complete nonsense what we see here and once we have something like that the small world network almost and which we would cover we have a large we will also identify clusters and this are actually the things we are interested in because these are the groups where we might then have a reconstruction of a rhyming group and clusters are then actually which I will talk in a minute about that are what we can then try to identify here so I looked at transitions between the rhyming groups because I was interested in that so how many cases of ah and up do we have on something else but as you see here this doesn't give us anything in this way so what I was doing was maybe probably there was some bias in the coding or I was the approach like that that doesn't work right just looking for the same rhyme agglomerating all nodes into run rhyme group and then seeing how much they interfere that it's getting really messy here where we can also do and this is maybe more interesting you will see if you look at a smaller set of the data it's making computing how often a certain rhyme group rhymes inside and how often it rhymes with other groups so rhyme group here meaning um old chinese vex as a garrick instruction and if you look at that without tones here though but tones should be I mean a glottal stop and s suffix which it's not ideal but it's just a first a first step you can then see like how often do they rhyme and whenever they have a really highly red color then it would mean that they rhyme in themselves so it's a really established group we see some spots here where it is almost they rhyme more often with other than with themselves so in these cases actually we might start looking for whether we find some patterns that we should revise maybe or we can find an explanation in the approach that I use so but even better is searching for communities community is actually the most interesting thing in my opinion when doing network analysis and just to explain quickly what a community is I mean you may know what a community is but it comes from social network theory in the idea so people are interconnected so we have some people here and the people they know each other x knows y and y knows z and if we look at the connection now we can actually identify certain connections that are more important that are more inside a group than outside and we can show this for example like this in this case I would say that this is a group and this is a group but these nodes are less important maybe they just know each other like coincidentally by this we can then split a network into two communities in this case and label them accordingly we can also give them labels like Chelsea and Liverpool or we can assign numbers to them like one and three and this was last saturday by using this I was then applying this community algorithm to to the network and so I was using informat which is in my opinion a really nice algorithm but also one and best from 2008 and it is a fast community detection algorithm with a very good performance so in my opinion in my experience also working with other data it always needs really nice results it had its weighted nodes and weighted structures and we have both of them here in the network and uses random walks to the network in order to determine the best partition into communities the results can be again expected in another app and I will show this now and I hope you can I can actually make it a bit bigger so I would just plot all the results so the whole application or I would just I would just show this late so in because now let's look at the at the big picture of the so we make this analysis of the of the network and split it into communities split it into groups the question is what do we get and if we look at this then this is what we get so you can see we have large clusters and we but actually have almost 400 different clusters so the things are and this is actually only showing a connection from one cluster and they are all still interconnected so searching for structure here is still difficult what do I find here it's just maybe it looks impressive or something like looking at a network like that having a nice visualization but it doesn't really have so I also looked into what they actually show they are not necessarily overlapping directly with the OCBS reconstruction of the RIME so not necessarily means the same RIME in the OCBS system could be split into two communities just because these rules but which also makes sense because we know that not all words like will be used in the same context in order to RIME and so this is why they're interpreting the data is difficult because a community as I said so a community identified by informant it's not necessarily homogenous since RIME is also not homogenous we have cases especially when a character corresponds once in the shooting then it will be maybe assigned to some place where we do not want to see it a split of words with the same RIME into two communities does not apply that they do not RIME and we always need to get back to the real data and see what is going on there now this is why the this is why I think that the app is useful because here we can search for certain parts so it is really preliminary so I can for example I will just show that I can for example say like give me RIMES of not in Chinese RIMES equals HAN and I and then I press on okay and it filters all the instances where we have any of these like in in the data and we can see here for example here is a group community 16 and now we can click and see all the characters where they are here we can see the stanza where they occur and we can check this with the other app so if I click on this and open the link in a new tab then I see in which poem it occurs I think this is also useful just for checking the specific instances okay now getting back to so what I was then doing is just breaking it down so I was looking at only certain cases and what I wanted to look at was the R code up because this is something that is new compared to 1992 and this is something which may be worth looking from this perspective because the RIMES may give us some evidence here and here's again like the same what we had before like inside the group outside the group how do they RIME and we find that we have certain valestandish cases like AI, AN and AR seem to be rather clear here but we have also cases where it is less clear so AR seems to RIME more often with N than with itself these are our cases we need to look in detail so this is also what I so I'm not claiming that we just use these things and show like look at this but when those back in detail and look at the cases what is going on there but now an interesting case just to illustrate that this can be useful or to advertise it a little bit because looking for example at RI and AN at this split and we look at the network here then you can actually see that it seems to have some structure so we have a cluster here we have a cluster here and we have a cluster here and here you see annotations on RI in old Chinese backstab saga as Laurent provided me with and this is only the first one where I was ignoring the fact that in the book actually they have this nice practice of showing uncertainties I was then here I do not I just ignore all uncertainties and say I take them as right but if we add the uncertainties the picture changes somewhat and we see that the structure we have with i-ripes being over here the green-ripes being potentially over here the blue-ripes which should be a transition group occur here makes much more sense so with the uncertainties and the uncertainty and the uncertainty actually a case where we can now use this analysis in order to provide additional evidence in order to resolve that so having something like this potential split but of course it doesn't mean that for example this green character over here that it isn't really known on or something it may be is it is because it only occurs once it's isolated so we need to be careful when looking at the data but not again the same view but not we use the community detection algorithm and how it splits off this and it splits it off into separate cases you see here again and here again I don't show the unclear cases and if you look at this cluster for example and I now go to the unclear view then you can see that all the green cases which were annotated as potentially potentially uncertainties are getting yellow and we have them only surrounded by blue runs if you look at for closer then we have these characters here and now provided that we are really sure that the reconstruct is really sure about whether Sean should really have an R quota and we had the uncertainties we could make the conclusion that we say okay maybe the whole cluster is R should be reconstructed as R and here is a point where the detailed analysis of course and I'm only showing the preliminary stuff could then potentially yield an improvement of the system by resolving the uncertainties and this is all I wanted to show only a short outlook and where are we with this well right analysis based on the network approach is still strictly experimental we need to enhance the data missing readings dropped lines in the shifting text where I have all kinds of stuff like the version I was using was not particularly good and we need to enhance the models better normalization as I mentioned before but already at this stage it turns out to be useful to inspect the automatically identified clusters in times of dark regarding the reading of the sugar character I think it's generally useful to make use of interactive visualization techniques when dealing with huge amounts of data and tools like the shitting rye browser are especially useful for beginners but probably also for experts and where could we be imagine a world in which we have large collections of writing networks on all kinds of poetry ranging from Shakespeare by a Bob Dylan up to Emma imagine we could gather important we could then gather important information on writing behavior both cross-culture and culture-specific we could track the emergence of hip-hop or the degradation of writing patterns in modern poetry or we could even we could even try to test the influence of the Judas Paul on Bob Dylan's rhyming practice imagine seriously that we could carry out large-scale comparisons on rhyming practice in different stages of Chinese that we could propose transparently our individual assessments of what we think rhyme and pieces of old Chinese poetry and that we could trace the history of Chinese and poetry networks and I think doesn't that sound like it could be interesting okay and thanks to Laurent Sagan William Vexor perhaps for discussion tips ideas and thanks to Bob Dylan Eminem Shakespeare and all the other poets out there thanks to you for your attention