 Well good morning everybody or good afternoon or good evening as the case may be and thank you very much for attending our presentation and what we'd like to do is to spend a bit of time telling you about the project which we're running at Hong Kong Baptist University which is about translated material in Wikipedia articles. So the title of the project is understanding Wikipedia's dark matter, translation, knowledge and point of view and we see translation in Wikipedia as Wikipedia's dark matter because we know it's there, we don't always know where it is and indeed we don't know how much of it there is. So our research focuses on analysing multi-lingual sets of articles in Wikipedia on a number of current affairs stories, current affairs events. We'd like to acknowledge the grant that we were awarded by the Hong Kong University Grants Committee and if you're interested do take a look at our blog which is called Wikipedia's Dark Matters, a WordPress blog at that URL. So in brief the objectives of the project we have a number of objectives, we've been working on Wikipedia for a number of years and we have a few ideas as to how to investigate this somewhat complex topic but we're wanting to see if we can improve the way we search for and analyse translated material. Our research is based on multi-lingual case studies and we are aiming to carry out further case studies which we'll be looking at the location of translated material in Wikipedia articles and the role that it plays, more of that in a minute. We're interested in extending the research into new areas, possibly areas that we haven't anticipated but I think in a way most importantly we want to contribute to the growing body of knowledge about Wikipedia translation. As you all know Wikipedia exists in more than 300 languages and there is a lot of translation that takes place. Wikipedia is an incredibly important resource as you all know but it's useful, we consider to know exactly where the information and the knowledge comes from that you read, who puts it there, where does it come from. So we're hoping to contribute to an answer to that question. We're looking to develop one or two new analysis tools and I'll give you a brief preview of one of them a little bit later. Hopefully we'll make some of our data sets available and as I said we have the project blog. So if you're analysing Wikipedia digitally there are a number, quite a large number of tools already available to help you to do this. In fact we like to follow the principle of using Wikipedia or the materials and the resources and the tools on Wikipedia in order to study Wikipedia. So we look at the revision history statistics for example. We look where relevant at the talk pages. Those are a bit hit and miss, we're interested in translation and in many cases that's not a topic of discussion but where it is obviously that's a useful source of information. The page view statistics, the lang view statistics and the site view statistics as well to obtain numerical information on the different language editions that contain a version of the article on a particular topic that we're looking at. Wiki blame is a tool that we use quite a lot. The point of Wiki blame is it allows you to trace back the first use of a word or a phrase in an article's revision history and that's quite convenient or quite useful rather. You can see when a word is first used firstly in English and then secondly in another language and if that's forming a part of the sentence pair and you're not sure if one has been translated from another this can give you an insight depending which one was the first to appear, which was the original source and which language it was translated into. There's also a tool called Wiki Who that some of you I'm sure are aware of and this is a kind of add-on to Wikipedia and you can get it to highlight the sentences and the material in a particular article that are attributed to a certain editor and there's a slightly cut-down version of that also called Who Color and then we have the project alignment tool that our research assistants is working on developing and this is extremely useful. It makes the process of identifying potentially translated material a whole lot easier. So this is a, it's not fully developed yet. This is where it's at at the moment. These are sentences taken from an article in English and the corresponding article in Chinese and you'll see the score on the left-hand side. This is a score reflecting the probability that those two sentences correspond to each other. In other words, one is a translation of the other and I'm sure you'll appreciate that that takes out very much of a lot of the sweat involved in identifying translated material. If you're interested in that, do take a look at our presentation on this tool, which you can find in the on-demand section of the conference. So if you're researching Wikipedia translation, I think I've hopefully made it clear by now that there is a huge lack of clarity regarding how much translation there is, what its focuses are and where it's likely to be found in the course of a particular article. I like to think of Wikipedia articles as objects that exist in four dimensions, including the dimension of time. As you know, all past versions of an article are archived in the revision history and they're available to inspect via that revision history, which is attached to every article, as everybody knows. Untangling the way in which two articles have evolved in parallel can be rather complex and painstaking. Typically, this would be the English Wikipedia article and one in another language you might be interested in, English is rightly or wrongly for better or for worse, it tends to be the go-to version if editors in another language are looking to populate their own developing article. Wikipedia translation, which is the term we used to refer to this, it's not only involving bringing a material from one language version of an article into another but very frequently, possibly more frequently, it involves translation from outside the encyclopedia and with multiple editors working on an article and the information, the content coming from a range of different sources, Wikipedia articles very quickly come to look like collages. We're interested in news stories and in particular rapidly developing news stories and translation in the news has been researched in quite a lot of detail and translation in the news does not generally involve translation of an entire text, it involves a number of processes like the ones enumerated in the first bullet point selection, correction verification and so on. So it's putting together, it's compiling a new text on the basis of different sources and this is what we see in Wikipedia and one of those sources may well be another article in another language. The special role that Wikipedia will play in news reporting or chronicling news events is that it falls between the timescales of newspapers, news broadcasts and so on and history writing, somewhere between the two and in the articles we're looking at the time frame tends to be very short and I suppose I'm particularly interested in the early stages of reporting on a recently breaking news story. The typical way of reporting on a news story in Wikipedia is for quotations from press sources to be brought in and then to be worked into a coherent article. So translation on Wikipedia is rather atypical. First of all a translator, a translator editor will decide what they want to translate and then once it's been uploaded, been published then or any other editor obviously can mercilessly edit their translated text which takes on a life of its own. This means that both the source text and the target text, they're both subject to change. I tend to refer to them as moving objects which means that the equivalent that might exist after a translated segment has been inserted might not persist for very long. As you know, no finalised version of a Wikipedia article or indeed of a particular translation is usually arrived at and following on from what I was saying about news translation, the distinction between translation and original writing summarising, paraphrasing and source sharing can get somewhat blurred. Now a point of view is a concept that's very important in Wikipedia particularly since the official policy is to pursue a neutral point of view principle. And I know that editors do strive after this very conscientiously but in reality of course complete neutrality is very difficult to achieve. Now of course that being the case one would expect different versions of an article to have different points of view and that is indeed very often what we do see sometimes it's referred to as linguistic points of view, the points of view that is specific to a specific language version of an article. Points of view has not perhaps been theoretically formalised to the extent that another related concept has been and that's a concept of narrative, a narrative being an underlying story that gives events a particular meaning to a particular group of people. There is a clear overlap between points of view and narrative and that is being actively investigated in translation studies but at the moment points of view has not been theorised at in any detail. Now when you're looking at a Wikipedia news article you can look for clues as to the point of view or sets of point of view or narratives that it contains in quite a number of different features of the article for example the way it's structured or balanced, the information that's excluded or highlighted or maybe excluded, the selection of comments and reactions that are included, the sources for quotes and expert opinions the number of references in different languages as well that's quite significant, also the tolerance of controversy the general tone of discussion and so on and of course as an article is developed the point of view that it represents may also slowly shift. So it's difficult to answer this question generically I'm going to be focusing on a particular case study very shortly and we're going to be looking at the English an English article and its translation into Chinese and it does contain a lot of translation typically translation will involve text fragments which can be from another Wikipedia article or from an external source and also copying of major features from one to the other. Now I'm going to perhaps jump on a little bit because I do want to look at the case study itself which is involving an event that happened about three years ago I'm sure many of you or all of you will remember this there was a poisoning event a former Russian spy living in the UK and his daughter who was visiting suddenly fell ill after an attack by a nerve agent called Navichok and this was widely reported and written about in Wikipedia in fact there are now 25 different versions of the article in the multilingual encyclopedia. The English article has been developed in great detail by many editors over 500 editors and it's now approaching 10,000 words in length. There is the brief breakdown of contents a very wide-ranging article and then the Chinese version somewhat shorter and worked on by fewer editors and not as actively developed as you'll see and they're a much shorter, more succinct table of contents. So without our tool finding the translation was painstaking it was a highly painstaking in fact because you had to if you found a... if you're looking at a particular sentence you had to search for its equivalent using keywords, people's names and so on and then you could trace back to where each one first appeared in the revision history you can see those snippets at the bottom of the screen and in cases where this was a matter of some doubt it would allow you to see exactly which one appeared first and then what the direction of translation was in this case it's fairly apparent that the translation was from English to Chinese. So that's an example of a snippet, a small piece of text which was translated from one version to another I also looked at the Russian there was a lot of translation mostly from English to Russian partly also in the other direction but for the Chinese there was a massive amount of translation Now the other feature I would like to look at following this event there was a tit for tat a series of tit for tat expulsions of diplomats and this was presented in a number of Wikipedia articles in the form of a table it actually appeared first in the Russian and then it was added shortly afterwards to the English and then to the Chinese and also it appeared in a number of other Wikipedia versions so we see that in English it was put in alphabetical order