 Ro B. Can you hear me? Yes, I got it. You're alive. I'm gonna get off your life. Oh Okay I'm gonna get off your life Okay, I hope this not echo I can't. I'm going to get off your life. Oh, okay. I hope there's no echo. Can you hear me? Yes, I can hear you. I'm going to get off your life. I hope there's no echo. I know you can hear me. Yes, I can hear you. I hope there's no echo. I know you can hear me. Yes, I can hear you. Hadeka, your law is going twice. You have to close one of your windows. You're logged on twice. That's why we're here in an echo. You have to close one of these windows and Then share your screen from one window. Does that make sense? Can I start? Yeah, but you have to close one of these screens. That's why we're getting echoed. You're logged in. Okay. I see your screen Yeah, you can start now. I'm gonna log off So I'm going to start I don't see my participants, but I'm going to start now Is that all right? Raymond, I hope I'm audible. Okay. Can I start? Okay, can I start I hope? Okay, great Okay So thank you. I'm enjoying being in Wikimedia. This is my first time Let me introduce myself. I'm Radhika Mamidi. I'm part of this team that we call ourselves You know, we all work at reply to Hyderabad. That's in India and We are working towards developing articles in Telugu Wikipedia So, so this is my team and they also have a session on Monday. So basically We are trying to increase I mean our intention is to increase the number of articles in Telugu Wikipedia which has very Small number if you look at this For example, let me make it larger. So if you look compared to other languages The number of articles in Telugu my language my mother tongue basically So it's like 70 plus 70,000 articles at the moment now our team Intention is to increase this number using different methods. So some methods are by building community and writing from scratch manual translation of existing textual materials using machine translation system and then of course using technology and The most important thing is That we have to invest our energies in quality checking by human post editors Now that is our motivation that to increase the content in Indian Wikipedia So basically we are not working on all the languages at the moment on Telugu and Hindi later So this is my title which says to translate or not to translate by the machine. That's the question Okay, thank you to many. Thanks to Shakespeare for this dilemma, but I hope by the end of the Session our dilemma will be solved So so this is how I have planned to go with my outline So we look at translation then machine translation Some of the problems that exist when we take English as the source language and try to translate into Telugu and then Some of the given the time Constraint, let me also see Some of the post editing challenges that one would face Coming to our translation Translation is needed to break the language barrier So today's world Where with the help of internet and web 3.0 all that you know The the barriers is no longer there But at the same time interest in one's own language has become important like Even in India like most of us are bilinguals, we know English, but at the same time we want to do something for our own languages So most of the knowledge is available in English and but we want the same to be available even in our mother tongue So human translators Advantage is that the translation quality is a very good Good nature. It's of publishable Quality but in spite of all that success in Machine translation, we can still say that human translators are unchallenged Now, but if we actually say that human translation is that good, you know, machines are useless then let us look at some examples So we have seen I mean multiple translation where humans have done lots of errors and Human translators take more time to translate compared to machines Similarly human memory has its limitations For example, you may spend lots of time looking up for a equivalent in your language But after 10 days, you may have forgotten that word. You have to look up again Okay, so so this memory is one of the things that machines have That's an advantage machine would have over us Now this is just to show that, you know, we have to be kind to machines because humans are also not so good in Translation so many of us Would come across some funny translations, right? So so example, this is at a hotel lobby that says that the lift is being fixed for the next day During that time, we regret that you'll be unbearable Now this is in a country where English is a second language. You can see that, you know, the word could have been Please tolerate or something, you know, but when it's translated using the need you nuances, it may become something humorous So there are some more examples here to show that Translation by humans also can be funny So this is one from India, which says eating carpet is strictly prohibited. So you as linguist I can make out here It's the last of a Proposition here that is making it funny But this is the most dangerous translation. It says hand grenade So given this, let us look at what is translation So translation is an art or translation is a science translation is a craft So there are different people philosophers linguists, psychologists, sociologists They have also contributed to the theory of translation. So Many argue that translation is an art because a translator actually uses Imagination actually uses creativity to come up with another text, but at the same time Scholars like I will say that it is a craft because it needs lots of practice lots of training to do a good job at translation But we can say that compared to human translation, then machine is is a craft machine translation is a craft You need lots of skills to do that lots of training to do that and lots of data everything to learn from So this is a picture I took this is from some money related site, but So this is a good ecosystem that shows that both machines and humans can Can be a part of this whole exercise translation exercise Which shows that humans as post editors contribute to good translation So one need not say that a machine translation is the final say But once the output is come a post editors can contribute to it So let us look briefly about machine translation Many of you may already know about the history of machine translation, but let us quickly go and spend a couple of minutes on that So we have come a long way from here when The initial machine translation system translated this and sentence funnily today Of course it translates much more appropriately so machine translation Follows the same path as human translation That is it decodes the meaning of the source text and we and God's this meaning in the target language so basically analyzes and generates language so analyzes the source text and Generates the same meaning in the target language So for any translation whether it's human or machine So it needs five types of knowledges So knowledge about the source language knowledge of target language the equivalent between source and target language knowledge of the domain and General knowledge world knowledge and social cultural aspects like for example if you're translating from English to say French You should be aware of all the conventions of French culture So basically the linguistics of source language and the linguistics of target language are essential for a good translation system The same applies to machine translation system, but today everybody many people try to criticize machine translation, but One of the reason is that the faulty in I won't say a full fault, but The human language itself is complex. It's filled with ambiguities. So so trying to develop Algorithms based on such kind of data becomes difficult even for a machine through Translate especially with languages that are not similar So a little bit of history it all started after World War two when 60 Russian sentences were translated into English and with very good accuracy and money went into Into that research, but after 10 years It was a setback because the expectations was not met and in later in 1980s with increase in mathematical and statistical models again once again interest in machine translation came around and by 1990s, there were many Translating system, but they all were translating single sentences later. They started Translating at document level as well at the same time in parallel The human translations were assisted by machines in the form of translation memory in the form of machine readable dictionaries So this was happening parallel So human translation was assisted by machine and then we had machine translation systems as assisted by humans So by the turn of century, that's like last 20 years. We can see that the machine was Actually producing good quality translation now So today machine translate systems are used for translating large number of documents a United Nations or European Union where multi lingual culture is there Similarly machine translation systems are used to translate highly advanced thing in every day things are changing and Having humans becomes difficult like human to translate said 300 page book on the latest in medical field It may take one year to translate but by then things change in the field So having a machine to translate within one hour is much more easier and then on top of that The domain expert and language expert can post it at it So quickly go over the existing machine translation systems. We have I mean with necronological order. We had rule based systems then example based machine translation systems Statistical and today we have more successful ones are based on neural network models So these are some figures just to share is show that when the languages are very close You can just use a dictionary to translate between the languages as we go a little higher Transfer rules so you can incorporate rules here to translate between the two languages interlingua when you want to Translate into multiple languages say one source and many multiple languages having interlingua is the best method But this was the early method then came example based machine translation Where for example, I took this from the web. You have this sentence. I am going to the cinema, but The parallel corpus already had a similar sentence with the 90% of those similar words there and the machine Would learn that okay, so cinema is you know, okay because the structures are very same so this is like when there is a Parallel corpus it It aligns them and then learns the phrases equivalence of the similar phrases a little Advanced over a BMT statistical machine translation. It learns It keeps on learning again. So there is a machine learning being done like this example again Picture I took from the web which shows that the machine is understanding that Ambiguous word how it gets disambiguated by the human in presence of other words and This is a neural machine Translation model which has many hidden layers and it this is More closer to the human translation so So I'm Just saying that we have different methods and today an empty and SMT That's the statistical and the neural ones are the most successful methods and like any other application evaluation of the systems is a must and The oldest method of course is the human judges, but they are very expensive. We also have automated methods Today we use bluenist meteor So that's about about machine translation system Most of you may have used at least two or three systems even while writing Wikipedia articles So some of them I have listed here, but there are many more commercial products also So there are big companies as well as small companies who are into the market now coming to This why why machines don't? Translate as expected is that Languages themselves have different structures different morphology. That's a word structures Like Telugu does not have determiners and finding equivalence of idioms Collocations may a metaphors is very difficult and then there is an ambiguity Now that said I will just talk about Telugu language because that was what I wanted to share so this is Telugu Telugu the greener part and It's a language spoken in the southern part of India and this is It belongs to the Dravidian language But India has many more language families Including so this is the Dravidian part. We have a Indo-Aryan Indo Iranian we have Astro-Asiatic So normally I don't know the exact number, but they say there are more than 800 languages in India So we can say at least 25 to 30 languages are More have more popular like education is imparted in those languages, but otherwise Wikipedia I think there are around a 10 to 15 languages maybe in Indian languages and Hindi has more number of articles compared to other languages