 Bonjour everybody. Well, he's Christian Pereira and he will be talking about internalization in Debian. Thank you. Do you hear me? Still? Again? Okay. Oh, that's not the title. Okay, that's the usual joke and I say hi to my son who's watching and this was a bet with him that I would show a SpongeBob in one of my talks and I did so so I won the bet. Okay, let's go for the title. Well, as usual, that would be the gazillions in talk. I do an internationalization in Debian for the Debcon. I guess in Debcon 50, I will be still be doing that very boring talk. Anyway, I know you all came for one thing to see the nice maps so I just I just put them at the beginning and then after you can go away and do more interesting stuff. So what would I talk about? I will try to give you a rough picture of who is involved. Who is involved in Debian internationalization? Who is working on this? What are we working on and also how we are doing that work? So who is working? In Debian, we have this common sense that we are very international project. I know you don't see anything. There are yellow spots on the map. You know about this map probably. This is the map of the developers all around the world. So we are feeling we are very international. If you look at the number of developers that's less obvious. There are six developers in Africa, 263 in North America, 33 in South America. That's a lot less. 51 in Asia, mostly in Japan. Six in Africa again. Okay, I messed up with the slides and a lot in Europe. That's too much for me to say it in English. So Saint-Saëns was on the set. And Oceania is basically this thing on the bottom right, which is named Australia and New Zealand. So as you can see, we are not that international. There are very wide parts of the world where there is basically no Debian developers. But, okay. So we are that international. These are the maps. Okay. So for those who have not seen this yet, we are basically painting the world in red. So in Debian Potato, we had one language covered for to install the system. In Debian Sarge, in Debian Hoodie, we had 16. Then we jumped to 40 for Debian Sarge. H, our last release has 68 languages covered in the installation system. And Lenny will have, I guess, 60. Okay, let's do it again, because we all love that. If that wants to work, crash, crash. And oh, yeah, I forgot. Maybe someday we will have 60, 78. So let's go back to see if it, okay, doesn't want to go back. And painting the world in red. So I would say we are that international after all. So with the IT and crew, I would say, Vundobar. Okay. Let's try to get a closer picture to all these places. In Africa, you have seen the map. The map is very empty, basically. I see two real reasons. We have strong, I said, colonial languages, the French, the English, and in some extent, the Arabic language. You see, these are language families in Africa, not all languages. You see a strong domination of green for Arabic of blue for French and orange for English. So this is what, in my sense, limits the development of internationalization in Africa, because the communities over there have really hard times to feel they should translate stuff in their language, in their local language. So one exception is South Africa, where there are some initiatives to develop the localization and particularly the Unlock, African localization network project originates in South Africa. And they led to the development of some tools, namely Putol, Putol, which we are using in Debian and originates in South Africa for projects from the translate.org.zda project. So Africa is becoming active, not visible in Debian, more visible and user project. So let's go to North America, North and Central America. The two together. Well, we have 100% coverage for what I would call the colonial languages, colonial languages, English, basically, and Spanish for Central America. But we could also have more coverage for indigenous languages. There are some initiatives that are beginning to cover the native languages of North America, and we'll see later South America as well. There is a great variety of language and a great of cultural richness to cover. And a few initiatives are beginning to work on translation for these languages. So I would like to introduce you to the Sekhwanpotsin. Okay, everybody got it? Sekhwanpotsin or Shuswap translation. This is an interesting effort that began in Canada, for instance, for a tribe of, I would say, 500 people in Canada's Alberta who speak this language, which I will not pronounce the second time. And I did it very badly anyway. So this is something that will develop, I guess, even in North America. There is not only English and Spanish. Let's go down, down under, to South America. Okay, South America, we have two very strong internationalization community. Either the Spanish one or the Brazilian Portuguese one are very strong, very efficient at translating, localizing stuff in their languages. And again, we could have the very same interesting situation where indigenous languages, and even more in South America, they are a strong feeling to develop the localization for indigenous languages. I had very interesting talks with people from Bolivia, from Venezuela, about localization in Aymara, in Quechua, and other languages I don't remember about. And this is beginning again. Also in Argentina, there are indigenous languages, even though there are some unclassified or undocumented areas. So you will all be happy to learn we are living in an unclassified area these days, but still. One challenge for these communities is to be able to translate not from English, but mostly from, let's take an example for Quechua or Aymara, from Spanish, which brings some interesting technical challenges for people who are developing tools to translate, because we should allow people to work from another language. The same stands for Africa, where many people want to translate from French. Asia, the picture of Asia is pretty hard. I made it on purpose, very confusing, very complicated. I don't expect you to read anything. But the basic picture, we have strong communities in China, Japan, and Korea. We are doing a lot of work for internationalization especially in Debian. We have a very strong community in South Asia, mostly India, who is translating in all the languages from this place. There are 21 or 22, I never remember, official languages in India. In nine scripts, nine ways to write things down, these ones. This is the letter KHA in all scripts of India. And there are very good developments in that matter, and especially also in Debian. We have a few localizations, including the installer. There are also some local and governmental initiatives, particularly again in India, but as some of you may know, in this small country named Bhutan, which is quite a model for me for to develop localization in local language. Europe, we all know about this mosaic of language. Of course, Europe has numerous efforts to localize software. We have a lot of communities in Debian. We are translating in European languages. We cover nearly all of them. We have strengths, but we have some weaknesses currently. And these weaknesses also apply to, I would say, nearly all the localization communities. I feel like I explained to people all this week that I feel like our localization team are weakening these days. We don't see many, many people coming more inside our teams. I don't know exactly the real reason, but they are going somewhere else. I think I know where they are going. Most of the people who are new to free software want to work on translation when there are non-technical people. And most of the people who come to free software to Debian world currently come through Ubuntu. And there is a strong demand in Ubuntu for localization, translation. They have an infrastructure, but this thing is not very organized these days and doesn't interact currently very well with Debian. We had discussions this week about this stuff. About Europe, I wanted to mention that many people feel like we have strong localization teams. The French team is known as the strongest team. We do everything. Indeed, we are very weak. We are few people. Not that many. And we are weakening. That's important. Oceania is basically Australia, New Zealand. At the moment, I don't know about many, many people or many, many initiatives over there to localize, to work on internationalization. Except if we include Indonesia, where there are a few local initiatives, some individual mostly. I don't know about many governmental efforts to develop localization over there. So what are we translating now? I spent a lot of time on voice translating. So what are we translating? Well, the most well known, my pet stuff in Debian. This is basically depth constemplates. Not because this is the most end user oriented, but probably because this is the easiest to do. These are nice graphs. Thanks to Nicolas François. We are showing the development between Edge and Lenny release cycles and the development of translation. These are interesting tools to see which teams are strong, which teams are weakening. For instance, you see the Portuguese team. I don't know if you see my spot. Yes. This is a huge effort to bring stuff. On the other hand, I think this is the Vietnamese team is weakening slightly. Or you can see somewhere, I don't know where the Swedish team, which was weakening and then revived recently. But as you can see, there is a good translation ratio for these depth constemplates. Because we have good tools, what happened at the end of January, probably Nicolas fixed some bug in the graph stuff. We are also doing for these depth constemplates, user interaction, maybe people of you heard about this project, English reviews. I launched that back in 2007. Under the name of the Smith project on April 1st, then no one believed this was serious. This was serious. And the Debian, Elton and English is currently, the DFR is currently stopped, but we will renew it for Lenny plus one. And we reviewed 150 packages for which we reviewed depth constemplates and package descriptions. And of course, this famous NMU campaign, which is aimed to bring these depth constemplates translation ratio up and up and up. It started in May, which is easily to spot here for the Galician translation with benefited a lot from it. And I actually NMU'd more than 100 packages out of six or nearly 700, which are using that company. Big effort, easy to do and good results. Another target are native Debian packages, native Debian packages, these are our stuff, not upstream, we don't want to translate upstream, we don't want to translate normal Kelly, they have their own internationalization team. But the package apt attitude, this is our stuff, we have to translate it. We have a very good internationalization of these things. We have good tools for doing this internationalization and the good tools to track them down. This is huge. This is difficult. I think that we in the room has been involved in deep package translation someone. Yeah, is here. This is a nightmare, basically a nightmare, because very technical stuff and how to translate. Not quite worth it. Having aptitude translate in French, Spanish or whatever is very good. So the results are here. Many languages are good translation ratio for these difficult tools. I'm sorry for this table, which probably unreadable. But anyway, package description, that's the Debian description translation project, the name TDTP. This was a Lenny release goal. So the tools had to cope with the translated descriptions of packages. We achieved an improved infrastructure, DDTP and what is named DDTSS, which is basically a web interface to get and send package description translations. We improved the methods. We are trying to improve the English usage in this description, but we will use only 150 packages. There are, you know, the number of binary packages in Debian, 20,000 or so. So this is a huge task. And I always tell to Montana, please pay attention to the use of English in your packages. Please ask for review and get you corrected. Even the native speakers, by the way. And now this is the very important and most important news. This is now flowing to the archive and thanks to the huge work of the FTP Masters team and the I18N team in this room. And this week, since yesterday, the package descriptions that are translated in the DDTP are flowing to the archive and you can see them when you are using packages.debian.org or aptitude. So this project, which started a few years ago is finally giving results. And I think they both deserve a big, big applause for that. No, not the moon. You can see in aptitude, or actually you cannot see, but this is displaying the description in French of my pet package, namely Genoweb in French. And this is the same stuff in packages.debian.org. So as soon as someone translates for the DDTP, this will flow down here. And this is very end user oriented. In contrary to depth const templates or the package which are oriented to administrators, this is end user oriented and a huge task. Now we have a huge challenge because we have to translate all these damn package description. So please write them down correctly, but not too long, please. Translation of documentation is a huge task, long standing task. We have large documents. They are basically written in HTML or XML. And for most of them, unfortunately, we don't have a very efficient or easy to use infrastructure. The current translator have to write XML files from the original ones. So very manual stuff, not very easy. Some of them, such as man pages, are going through a get text infrastructure with the use of PO4A. We are PO4A icing anything. Some documentation are translated, so it's translatable this way, such as the developer's reference. Many man pages also are translatable with get text. So the i18n team is trying to provide patches for this. And I urge maintainers to please adopt these patches. It makes our life much, much easier. The website also is a good target for translation in Debian. It is a long standing effort. It is giving good results. This statistic you are seeing, I picked them to illustrate the slide and not make it too empty, because Nicholas activated the statistic very recently, so we don't have a graph for a long time. So it's basically meaning nothing. But there, as you can see, there are many languages in which our website is translated. And unfortunately, with a quite difficult method to work and pretty not easy for newcomers to cope with. So we should improve that part also and use probably PO4A. There is also another target, some discussions I had this week with the Wiki people with Franclin, namely to try improving the translation of wiki.debian.org, which is becoming a source of information more and more important for our users. So we should try to have an efficient way to translate this. Currently we do not. How we are doing things? How we are doing all these things? I will not show you everything. Of course, that's not my point. The point is to give a general picture and anyway, I am not the one doing the work. I am just the one presenting the work and taking the credit for it. One of the main points what in the recent development is the introduction of the new server named Churro for Spanish speakers. It was introduced in 2006 during one of the extrema Dura sessions and it is becoming the main ITNN server in Debian. Currently under ITNN.debian.net. This is not admin by DSA. This is admin by us, ourselves. I would say Felipe mostly. Please raise hand, Felipe. Thank you. All the hidden magic we have. All the statistics you can see on the website. Most of the material is coming from Churro. Thanks to many, many scripts and many magic stuff written either by Felipe or by Nicolas who is standing close to him. At the end of 2007, also in an extrema Dura session, we moved the DDTP which was running on one of the servers of Grizu. Grizu, thank you. To Churro. Yes. This guy deserves the applause. Yes. And all this stuff is now running from this server, which is becoming more and more important for Debian. And someday in 2008, I forgot to mention, I did put in production the Putol interface. Putol is a web interface for translators to get translation, even do translation online. And this interface is now able to directly commit translation for the Debian installer in the Debian installer SVN, which is kind of the way we would like someday to work. Have a local repository for translation and hook up on it some stuff. Putol, a web interface, but maybe translator will want to work directly in SVN or maybe some other magic. Maybe someday we will want to hook Rosetta, why not on it, to exchange information with the people who are doing the translation and avoid duplication of efforts. There have been some discussions about this. And someday, maybe in 2009, we will be able to hook the depth of translation, this thing I was talking at the beginning, this easy stuff on it, and maybe move to i18n.debian.org. So that's psychological, you know. Turn this into a real production server. This is going on. And again, this will maybe happen thanks to one or maybe two sessions in Extremadura. Again, thank you for hosting the i18n.debian.net. Again, some applause. The workflow we have with, I started to talk about it before, anyway. The problem we have in Debian, we are very dispersed material. We have things in packages, we have things on the website in the wiki, et cetera. And we need to collect this into some place, this central place I was talking about. So that needs a lot of scripts and magic to bring this to a central place. I just discovered my integration mark went upside down. Interesting. Spanish effect, I guess. We want to hook put all on it. But this will bring us a lot of scalability problem, especially if we hook the Debian descriptions. There are 20,000 packages, about three or four paragraphs per packages. That means a huge number of PO files. And for instance, PUTL has never been designed for such amount of information. So we are working with the PUTL developer to improve the efficiency of their software also. One interesting development of, namely this week, this is a result, direct result of the depth conf and depth camp, the T depth. I will give you some words about the T depth, don't go into the grumpy nasty details because I don't understand them. T depth come from two ideas. The first one is make packages lighter. Localization stuff is very hard, very space consuming, particularly for the embedded device. So the idea of T depth was recently revived by Neil Williams, please raise hands. Thank you. To allow for M Debian to remove translations and bring them back only for the language that needed. This is basically the idea. And the idea number two, okay, I added this slide very late, so this is the ide number two, is to allow independent update of translations. For instance, be able to update aptitude translation for stable after stable has been released and this independently of the package. Two of these ideas combined in the idea of T depth, which is basically having some special debt packages containing the localization and translation material. So this had a huge impact all over Debian, basically. So this week, we made a few discussions first and then a very, very efficient meeting with the FTP master, Neil was there, the IT and false weather release manager look was there. And we went storming stuff and this gave the results in the small wiki page here. Thank you, Mark for collecting the ideas. And basically, I think, progressively, we will adapt the tools for any plus one, make some packages use T depth for any plus one. And the point is to make a release goal for any plus two. This is a long standing effort, it will take a lot of time, discussions, meetings, we will have one in November in excrements about this. I hope so. So that will be a huge progress, I think, because we will disconnect translation from packages in some way, but not too much. I will end up talking about the IT and workforce. This, this is ID I got when Steve did this team survey. If some of you read the report, he mentioned especially the IT team as a strange case, because many people were saying this is a good team. And I was saying, Well, I'm not sure this is a good team because I'm too much acting as this kind of team leader or whatever, that I feel sometimes that I'm censoring stuff and people are just waiting that I start project or whatever. This is the challenge for teams. You know, so basically, we are slowly building an IT name core team. Those who are doing the things are, I would mention I'm already mentioned a few people around there, but those people who run through are basically those people. We will need to develop collaboration. I mentioned earlier, we are weakening in some way the localization resources, the people who want to work on localization tends to go elsewhere and namely to go to Ubuntu. Whether we like it or not, this is a fact. So we need to develop collaboration talk with Ubuntu people to develop this and help them to organize their resources because they are not organized this way. I have been confirmed this week. So we need to talk and gather whatever. I don't know. We need also to lead this community to animate this community, which is quite a challenge. Well, I know that some many people are expecting this from myself, but that's a huge task and not easy to do. So that's the point of IT and workforce. I know more good ideas about it. Okay, against Saturn. And basically, that was my final point. Not really deep hardcore technical stuff. And yes, I did put a SpongeBob again. Yes, I won the bet twice. And now I'm open to your questions and whatever you might want to ask me. And please speak slowly if you are native speaker for God's sake. You're not, but please speak slowly to this question might be coming out of my own ignorance. But I was wondering how the developers who are not directly involved in translation. Perhaps you guys, we don't know enough languages to be much use. How can we help? Is there a central place that we can go to where we could learn the kinds of things that can be done to make the job easier for you? A specification point is I have been wanting to convert all my man pages into P for a and I have tried looking it up a couple of times and I got lost. And I have never actually managed to do so even though I want to. Yeah, where can I go to get help to help? How could developers get more help or more information about internationalization? I think this is one of the challenges I mentioned earlier. We did a lot of development, a lot of good work. You mentioned P for a and now we should also think about documenting it. And although I've been requested many, many times to document in the developers reference for instance, what I often explain to maintainers do how to do things well. So we have to do that. The challenge we are facing this is a very time consuming task. And the only people who know how to do these things such as pure for a our pure for a master Nicholas has only two hands and these hands are already doing some very good stuff on the it and servers. So yes, please the Nicholas next time you please document the pure for a process. That's an order. You have to do it. Thank you. No, for seriously speaking, yes, we have a challenge. We need to collect this documentation and I think and the devian.org could also become a central place for this. Marga. I have a few questions from the IRC. First, how can wants to know some numbers about the people working on internationalization in devian? How how many people are working and how many people would be expected to be in the core team you mentioned? How how many people are working on translation in devian? That's a very tough question. Actually, I would say it's easy for many languages. One one person for Tamil, one person for Gujarati. But dozens of persons for Spanish, maybe dozens of persons for French, we don't really count. We we don't ask people to register on some fancy website with nice colors and therefore we cannot count. They just show up. They subscribe to mailing list and they start collaborating and that's all. This is our only process. So frankly speaking, I don't know. I would say not more than a few per languages. And we have like, well, if we count di, I said 60 languages. But most languages are only covering di and not that much. So I would say not more than 10 to 20 for the most, the biggest teams and down to one person for language. What I would suggest to people wanting to join and know more is to reach to subscribe to the debut and I team and mailing list and start asking over there about this core team. Basically, that will be the people who invest themselves the more and start to work on these things we need to do about T devs or documenting the processes or whatever. I don't think it really needs to be formalized to say I am a member of the core team. Yes, good. I am the leader. No, but it always happens that teams forms when people are doing things. We did a team because few people just joined together. That's all. That's this is what I call a team. And this is as far as I know, this is how all teams are working in Debian. So second question from my RC Ingosite wants to know about quality control for the translations because some translations are not so nice. So is there any plan for doing any kind of quality control? So quality control, reviewing, improving the translation, basically. Well, this is a recurrent question in translating. Should we translate everything and do the QA after or do the QA in the process? So this is basically belonging to the teams. The Spanish team may want to use its own process for doing QA. They have a very formalized one, the same one we have for the French translation. Other teams are just doing stuff. This is their process. The point would be, yes, the quality control should happen during the translation process. For instance, I mentioned put all I mentioned web interface. One could speak also about the other well-known interface which is Rosetta. This should include quality control, review, criticism, suggestions and collections. But at the end, the limit is the number of people doing the work. And I can tell you, even for French, the quality control is not that good. There are some horrible mistakes slipping out. typos, whatever, spelling errors, bad translations and the like. This is probably important currently for the package descriptions. We need to enforce a better quality control. Question over here? I hope I answered. I just wanted to add that we have the same problem for the English package descriptions. For example, I mean, there are also some very horrible ones. You really can't understand many with typos and stuff. So it's not a problem that's unique to translations. I mean, we have the same problem with the English ones as well. Yeah, quality control or the control of English, the English usage is certainly important. Well, this is basically what I mentioned. Theoretically, each English text exposed to the public package descriptions, website documentation should be reviewed by the Debian, Elton and English mailing list. But this mailing list currently has, I would say, two or three active members. I mentioned how many developers I mentioned in North America, 260 and a little bit more. There are 70 Debian developers in the UK. So yeah, maybe some people should also work on English localization. We need you guys to improve our own English. And you need us to improve your own English also. You mentioned Poodle. I just went to the Poodle website for Debian. And there are the DI translations that you said are committed directly to the SVN and the PodDebConf translations. What happens with those translations? What is currently on Poodle? What is accessible to Poodle when? Yeah, rock and roll. Thank you. I tried to mention on the Poodle website, the only production and I would say semi production stuff is DI. This is the only thing that flows somewhere. Poodle is experimental. I think I hope you saw on the Poodle website that this is in somewhere experimental. Anyway, nobody has access currently. So actually, people can only view the Poodle stuff but cannot translate this way, make suggestions maybe, but that's all. Actually, the Poodle is only my own sandbox. We probably need a sandbox server for me to play and not break existing stuff. But this is what I mentioned. This is the future. It does not end up anywhere. If I have time, I may ask about this. The workflow we expect about Poodle specifically is allowing translators to make translation either through Poodle or through the SVN access in the central SVN, where we will keep these Poodle translation. And we have to find a way for maintainers to grab this translation at build time. Maybe some depth helper tool, some hook in PODepconf. I don't know. But we need to gather these things and allow you developers to grab these things down to your packages. But I don't know how to do that currently. Christian, one of the things we looked at with the FTP masters was that whole idea of possibly having a second depth GZ that contains the updates for the translations. And that could be a feed into that to help the maintainers get the updated translations at build time. Because they would be then part of the up get source or deep package source X. You mean having a second depth GZ for updated translation, which could include the PODepconf stuff also? Yes. Yes. And the point could be then that our central place would generate this stuff. That could be an idea. Similarly to T depth stuff. Yes. It would be based on the T depth scripts and the T depth generation process. That's interesting. Yeah, I think this is a way to make the balance between packages being dispersed, maintained individually and the need for translation to have a central place. That's balance. Are there other questions or comments here, Nicola? I would like to give an answer to manage an additional one. I think if you have some requests to internationalize some program or documentation, when I could be to send a request to the IE 18N mailing list, I'm reading that mailing list. So I would be able to answer, but I'm pretty sure there are other people who can work with PO4A. And if you have some requests to localize some packages, I mean to translate the package, then there is one tool which is named PODepconf report PO. And in the version of unstable and lany, you just have to go into the PO directory and call PODepconf report PO dash dash call. This will send an email to all the previous translator with the language mailing list in copy, the language mailing list which is specified in the PO file. And it will also send an email to the DBN IE 18N mailing list so that if the document is not translated in a language, the language coordinator for the language can request their team to translate the document. I think it's what could be recommended to the developers and it is what is currently or what is going to be documented in the developer reference. Actually, I have a patch for the developers reference to explain all what Nicholas explained. But Nicholas, are you ready to cope with Manosh, the DBN rules files? I had a follow up to that. Scalability. I have 36 packages. I can send you 36 requests, but then there comes the next developer. I know people who have 200 packages. And this, as Christian mentioned, you already are busy. So even though this is a one off thing, I can ask you to help me. I was looking for something where we can, people like me who are not very familiar with internationalization, we can do things on our own set up the infrastructure in my rules file without having to bother you. So do as much as I can to get move the process along. And if such a document can be created, it doesn't need to just reside in the developer's reference. We can, I think this is important enough to move into the technical policy as something that people must do. But I think internationalization is important enough that it shouldn't be a best practices. It should be, this is what the project has said that we will do in order to make Debian more useful. But before I can put it into policy, it has to be something more than me personally requesting a favor. There needs to be a documentation and a process which can then be put into policy. Do you see the difference? Yeah. Thank you, Manoj. I think we're running out of time, so I will have to start the question or comments. And I will have to thank you, everybody, for listening this very early talk after the party and waking up. And now I need some coffee. Thank you.