 So the structure of this supposedly tutorial is about first explaining what will be in the talk. That's nice. And then digging inside what's the deal with getX translation for programs, for depthconf, and for man pages. So first of all, about the tutorial structure. What is in this tutorial? So what will I talk about? Mostly about localization of already international software. Also handling Debian specific localization problems in packages, mostly depthconf. And it's probably a very too short tutorial. Talking about getX translation in 45 minutes, it kind of challenge. Especially when you talk too much. What is not in the talk? It's not a talk about internationalization. Kanchi Muto gave one last year. It's not exactly a course about getX and especially not about getX in programs. I'm completely unable to do it. It's not a talk about character encodings or explanations about character encodings. It's completely supposed to be informal, as you may have seen. And it's completely badly prepared. Thanks to LaTeX Beamer, it's looked very nice, but no, that's not true. So a few reminders. I will bore you a little bit. What is first of all, internationalization? As you can read, it's barely get the software ready. Localization, it's about barely translating software. How? Multilingualization, it's about getting the software able to handle multiple languages at the same time. Globalization is all that big thing. And finally, the last one. Oh. You missed the joke. Okay, this is about getting bored by all what I'm talking about. Yeah, new. Sorry? Would define the code? The I-18N, the L-10N. This is the number of letters between the first and the last letter. Okay, so M, okay, a short way to say it, but anyway, because internationalization, well, one time I say it badly anyway. So now, I was talking too much, so let's better dig in the dirt. So dig in the dirt, let's find the way we'll get hurt if this says something to some people around. I will take the GNU Hello program as an example. It's a quite good example of already well internationalized software, and it's a quite good example of a Debian package, by the way, so maintained by Javier Saguna Pena if I remember well. Anyway, GNU Hello. We will talk about first, so I said program translations, and now I will be in trouble because I have to dig in it now, and I was supposed to do it with my screen and using the screen, and I don't have any screen. So, let's try. So we have, here I have these sources of GNU Hello. Is it okay for the display for everyone? Hello. Okay. Can you increase the font? Increase the font. You're challenging me. I think I'm using KDE, so sorry for, you know, people, and I use it in French. So it should be something like this, but I'm not sure, so if I screw up things. Police, yeah, yeah, and it's very good while looking, and using your touchpad. Enormous police. Wow, giant, that's fine. Okay, so, okay, well, a big mess. Of course, because of the giants. Let's just dig in the dirt and see. It will be probably a little bit small. I'm sorry for this. Are you okay? Okay, this is an internationalized program. This is not a C course, I say now. Just give you a few examples where can you see that the program has a chance to be internationalized, okay? This is a good sign if enable national language support, if I remember correctly, okay? Then define the string to be get text as underscore string, okay? So we will find later in the sources. So I'm not a C wizard, so if you hear me saying things which are stupid, this is pretty normal. These are stupid things. Anyway, at some moment this program says hello world. This is the purpose. So as you may see, it says hello world and it's internationalized because of this underscore character, okay? Here you see some mornings that may seems to be intended for translators. When the program author says a few comments for translators. Use blocks, blah, blah, blah. Use UTFAs, blah, blah, blah, okay? So that's interesting because you see how our program authors and program translators may interact each other. So this is quite important and we go back to my slides now. So I'm supposed to do according to my schedule, yeah. So after digging in now, we have our first lesson, lesson number one for software authors, please comment the sources. If you have something to say to your translators, please comment the sources. It's very important. For programs with confusing messages such as, well, let's say D-Package or APT, please comment. Alinda, by the way, translators, please whine when you don't have enough comments. Please whine. You're welcome to whine, okay? Now, we haven't seen yet this, oh, oh, oh, stop. This is what I like in Lattec Beamer. Okay, this mysterious profile or PO file. I used to say PO. People like to say profiles. So let's say profile. The profiles, which will handle the translation, we will see these files. They lie usually in a directory named PO, but that's not a constant. You can do what you want. PO means portable object. They are meant to be compiled by the program message format to MO file matching object files, okay? The advantage is there are text files, very simple to handle, and there are basically one header and strings. We will see PO files. There are few cave ads. They are based on English as a reference, and this may become a problem when English may be confusing for very short strings, for instance, or other problems. But this is quite common. So we'll see one PO file. Okay, I forgot to. I will be screwed with a lot of windows, so come on. Okay, let's go and see the work of our French translator. I'm using Emacs, sorry, for your VI guru. Here is a PO file with its headers. Can you hear me all the time? Okay, that's fine. The headers are project ID version. Barely you can put barely what you want here. The common use is to put the program names, but there is no very well accepted. Report message ID bugs. Well, I haven't seen yet a very good use of it. It's put over there by the get text utilities, but it's less often used. The POC creation date, this is the date, the last, the English messages were updated. So if you, as a software author, you happen to change strings, this date will be changed. But when translators update their translation, this should not be updated. The PO revision date is obviously the date, the last, excuse me, the last change by the translator. And the last translator is Michel Robitaille. Okay, this is supposed to handle the name of this last translator. It has to be put here by the last translator, not by software author. Language theme, it depends. Here you have the French language theme from the translate project. And a few mime headers, I won't comment about it, except basically the car set. The car set is chosen by the translator, never ever by the software author. Don't mess up with the car set, please. If the French translator wants to use this ISO encoding, don't change it. This is our problem, not the software author's problem. Here you have a few strings, so you see always the same couple, message ID, message string, message ID, message string, and so on, and so on, with the original strings and then translation. There is absolutely no formatting issues. So if I extend this line, I can, as much as I want, there's barely no problem. It will be, anyway, reformatted later, and there is no real difference if I reformatted on several lines, like probably somewhere down below, except when it's ended by backslash n, of course, which marks a hard return. Okay, let's stop. Messing up with this French translation because I did exactly what software author should not do. Just don't open these things. Okay, I should be answer, no, because I don't mess up with the translation. I, software author, actually, very bad one. Okay, so, well, I screwed my talk because, of course, I was not supposed to do this, anyway. I was supposed to go in the PO directory, so, anyway, I will go. Sorry for this, I said you it was badly prepared, anyway. So let's see what's in this directory, which will not be obvious because of the giants. We have a few makefile things, a file named linguas. Important thing, we'll talk about it. Makefile stuff, I don't command because I'm not a makefile guy. Port files, important thing. And a lot of PO and GMO and PO and GMO files. Okay, the PO files are the original files, the GMO files are the compiled files, right? So we have linguas, linguas lists what, which languages are supported, which languages will be compiled with the software. There are various methods, depending on the various building methods, I won't comment on it, but this is usually the place your list of languages supported. Port files define which files contain actually translatable material. The port files, dot in or port file. Port files, POT, it's meant to mean PO templates file. It contains the original English language, message. It just looks like an empty PO file. So creating a new translation is just a matter of copying the POT files with the right name and then start working on it. And the POT files, the PO files, there is one per language or one per translation. It usually uses the ISO 639 code and nothing else, please. Please check the ISO codes. Please don't invent ISO codes, of course, and don't use stuff like this. If I list the German translation, we see you will have a DE dot PO, not very obvious, and a DE underscore DE dot PO. So that's supposed to be a German translation and a German for Germany. This is just wasted translation effort and it happens very, very often. So Debian packages, just look your upstream sources and whine when this happens. This should not happen. And it has not to happen in Debian translation with a few exceptions. Come back, come back, my slides, please, okay. So lesson number two, I say, I'm going to go back to my slides and I'm going to lesson number two, I said important things, you know. Never, ever change translation. As a Debian package as software author, leave this to translator. Please do not change encoding without any reason. You may want your software to be UTF-8 compliant. That's fine. You can perfectly have UTF-8 compliant software without UTF-8 translation. This is the job of the translator. And do not use xx underscore yy encoding without knowing if it's necessary. To my knowledge in Debian, we have basically three exceptions to this. PTBR for Brazilian and PT for regular, so-called regular Portuguese. Two ways of writing Chinese. Simplified, zh underscore cn, mostly used in China. And traditional Chinese, mostly used in Taiwan. And a very new one about Punjabi, Punjabi being a language from India and Pakistan and written differently. So these are the three strong exceptions you may find. Walter was asking, what about British and American English? Actually, there is no big effort of separating the two translation effort, except maybe in some upstream translation. So yes, it could be an exception here. Maybe someday we'll have a DI with British English and another one with American English. I would say it's probably a wasted effort, but anyway. So first of all, I will try to translate to Klingon. Hello. Yeah, it would be hard. So the first question we have is what is Klingon isocode? Because we have to use isocode. Oh, sorry, I was not supposed to go back. The good way to do it, I just typed the command line, I won't type it. You will believe me. It's just to grab into the isocode package. We have a very good and very well-maintained isocode package in Debian, which contains very anything, including language code, language codes. And looking there, you will find that the language code for Klingon, because it exists, Klingon is a registered language, it's TLH. Some languages, usually the most uncommon one, have three letter codes and not two. Most of the time you will find and most of the supported languages have two letters. Okay, and then we just copy, copy, well, the pod file to TLH.po and then start working on it. So we try, I already copied the file. So I just have to start working on it and just get a reminder. So what I'm supposed to do here, okay. I created a profile for Klingon. So we see that the Klingon translator, with a very well-known, actually it's my own name in Klingon, did some work, there is a, yeah, there's a Klingon team also, with a very strange address at WM.TLH. So now we have a domain in Klingon. He uses UTF-8 of course, because writing Klingon needs a UTF-8. Well, this Klingon translator actually seems a little bit Finnish, but I know I did. But that was the point. So I just can, if I want to translate, I just fill in this thing. Okay, well, what I'm filling the wrong place because I'm just breaking the, well. Just without looking at the keyboard is not that easy. So if I do this, as my translation is outside the code, so I'm just messing up with the profile. So not very good idea. So the point was don't use it profiles or don't translate without a dedicated tool. So I just exit the thing and I just run if I can see my fingers, because I can't type without seeing my fingers. I'm a bad, bad, bad typewriter. Oh, K-Babble. I use K-Babble. There are a few such tools. And well, I broke it. Okay. I think I remember it was on purpose, but I don't remember why. So anyway, please use dedicated tools. Okay, I hope, I'm supposed to use K-Babble later. We'll see. I said it was badly prepared. No one left yet? Okay, so the main point was properly filled the other fields and do not change other fields. This is again a message to software authors, but also to translators. Port creation date, for instance, should not be changed. This is part of the outer, not yours as a translator. Okay, so we are supposed to have produced the TLH.po file where as a translator, I can send it to the software author or to the BTS or whatever. I should better check it before I do it blindly. So of course I'm doing typos. Sorry for your loss of view. Dot pio, okay. And of course it doesn't work, it was on purpose. Okay, I get a strange message saying me that line number 30, message ID and message translation do not have, both have backslash N. Okay, let's correct this and not with K-Babble because it will fail. So by hand. So go to line 30. This is here, oh, okay. Yeah, this is a Swedish version. Bock, okay. So let's fix this. It's fixed and now, oh, yeah. I fixed the code, yes. I'm sorry for, because just with, ah, this is a very good demonstration. Yeah, I think, I hope you enjoy. Maybe you should use VR. Yeah, maybe I should use VR because I was completely screwed up the thing, yeah? Yeah, but I screwed up the other. So I'm completely screwed up currently. So I will jump over this thing and I will show you something else. I guess, and as I screwed up the file, you will see that, wow. Okay, the point was message format is a very good tool to check your port files. And actually the point was made because it's a very good check. So if I check the work of our French translator, okay, it's not good. With Swedish one, maybe? Because I screwed up a lot of files just for the demo. Oh, okay, the Swedish one is correct. Okay, let's see something else. Is it complete? Again, we use it with our magic message format tool. And it says there's 27 translated messages. Everyone says, ah, no, everyone. I have a strange laptop here, here we are. Okay, so it's a complete translation. It's correct because it survives to message format. Good, we have a good translation. The Klingon one is screwed. Let's wrap it and continue. We have those, so I thought we were about this message format utility with very important and it has a lot more switches. But the most important ones are check and statistics. Actually it should be double iPhone. We have seen fuzzy, we haven't seen fuzzy because I was supposed to show you that message format was telling us there were fuzzy and untranslated strings. A fuzzy strings, does everyone know what is a fuzzy string? Who doesn't know what's fuzzy? Yeah, okay, barely anyone. Fuzzy means, just means that it's not completely correct. It was correct, the original string changed, and then we still have the translation, but it has to be updated. It won't be used, so it's incomplete. Okay, so when we change upstream strings, how? We fuzzy translation and we make translators mad. So don't change too often, please. There are a few others useful, excuse me, yeah. I repeat the question. If you find, the question was, what happens if message format reports that the translation file is incorrect? No, please don't fix it. Use the last translator string field and write email her a mail and please ask email her to fix it. Only fix it if you are completely sure of what you will have to do. I won't recommend fixing yourself translation files. Why? Because you're very likely to screw up encodings. If you use just like I did, the wrong editor, you have a very good chances to break files. So do it only if you are completely sure that you won't break it, and I won't recommend it. So a few other utilities. I go quickly because I run out of time, of course, because I was too long. Message cat. Message cat is supposed to reformat or concatenate two profiles. It's a good utility to merge message from two origins together. So I just jump on my example because I will be too long. Message merge is very useful for software authors. Yeah, I run out of time, my God. And it merges into a profile, the English strings coming from another file. Well, that's really not easy to explain, but if you update English strings, you have to update the profiles. And this is done with message merge. Message are very useful to manipulate profiles to, for instance, remove all the fuzzy states, all the obsolete strings, and so on and so on. Barely all utilities begin with an as g message filter, message, whatever. So a good idea when you're looking for something to work on profiles, it's just doing this message tab and you get barely all the utilities. And their main pages are very well written, so it's a good idea to use them and just figure out what can they be applied to. Message unique, when a profile contains two translations of the same string, get text we complain and refuse to use it. So message unique is used to sort out these issues. And a very new one I will hopefully document in developer's reference and try to get into some package somewhere is a message on teapot. Actually, it doesn't exist anywhere except on my laptop and Martin-Canson laptop. It's very used to deal with typo corrections. If you, it helps correcting a typo in an English string and fix all profiles without breaking them. That's a good new utility. That comes translation. You mean I'm 15 minutes from the end. Wow. Okay, that comes translation. It's barely something that lies in the Debian directory of the Debian packages. I try to show you very bad because of display. Yeah, we have a lot of things here and we have, I made a few templates. Yeah, I can, yeah. What did I do? I'm the deep hacker, you know. Hello.templates, that's very important and this is the one I wanted to show you. I'm afraid I will run out of time for the main page thing but people interested will come to me and we'll talk about this. Anyway, templates, again I'm using Emacs. Most important thing I wanted to say is how to translate Debian templates. This is this very nice trick introduced by Danny Barbier three years ago, just put a few underscores before some fields in template five just to mark them as translatable, okay? So these are barely random templates. There are no templates in hello program anyway. So the idea here was just to show how this may be done and with a nice trick, look at the main page, look at the depth command page and PO depth command page. They are very well within explanations about how to format depth command templates. This is a very good example of depth command abuse, by the way. A note which barely says nothing. We have tons of this in all our package inside Debian and it make translators very angry. Okay, so now I'm completely lost. Well, I was about to show you what happens when a Debian maintainer changes templates, which makes me always very mad especially when this is for bad reasons. So I'm supposed to just show you that the French translation of these templates was barely complete because the French translator translated five messages but didn't translate the stupid note, okay? And now, if you haven't seen it but there was a typo in this template somewhere, no one noticed and I don't even remember where it is as they log in and there are maybe others or bad English or whatever. Okay, if I update with this neat utility name that comes from the PR shoe, the PO, yes, good, you're a good question. Command not found, okay. I'm very good, there are a lot of things around. This tutorial sucks. Anyway, now if I go to the PO, the PO stat is something on my own to avoid typing. Too much as you see, I'm very bad at writing. And now I broke up, I phased one translation just because I made a typo, I broke the translator's work. So of course fixed typos but please try to find a way to deal with it. And we will try soon to write something about it in the developer's reference to handle this properly. Okay, here and now I have a few translation around and I screwed them up. So what should I do? I can use this very neat utility which is named PODebcom. Command. Report PO, just that command. Comes from the PODebcom package. And then we will see that I'm in a kind of template. The purpose is to send a mail to translator and request them to update their translations. This mail will be sent to the following people, Mr. Christian Perrier with a nice error. We have to deal with it, Denis. And Mr. Bock Bock, which happens to be the Swedish translator, of course. And this mail will be sent to these people. Of course no, I will. Mr. Bock won't like me. And request them to update translation, yes, France. Why is the message sent to this work? Yeah. Why is the message sent to individual translators instead of the translation teams? Why the message is sent to the individual translator? This is the default option. But this utility has tons of nice options that Denis added. And it may be sent to the team, to the team and the last translator, whatever you want. The default is to use last translator because it's barely the most reliable information. More than what translators put in their translation team field. Most translation team use mailing lists. And mailing lists are not open posting. So you will end up in, I did. And most of my posts are rejected. I use this utility very often. Yes, Leo. Yes. And most languages don't even have a team. So translators put barely anything in their fields. Okay. Oh, yeah. So I'm completely lost. I will cut down this, yeah. Okay, this was lesson number four. I will probably end up with lesson number four. So you won't have all the remaining nice, cute things I had. My point was right, please write good English. This is very important in original templates. Have you have seen? I made a mistake in my templates and then I send it. I didn't notice and then I broke up all the translation. There's a nice mailing list with very unused, which is named Debian Localization English, which is aimed to be a kind of way to request for review. Please use it. I don't know if many, many people read it. If there are many, many English translators, I know that Brandon does. Yes, I sent some of my templates for review. And he reviewed them. Please use consistent style. I didn't insist on it, but there's a very good chapter in the developer's reference about it. I tried to write some good advices about it. And make more and more use of common templates. We translators are so pissed off, you're always translating the same thing, especially for a few kind of packages, such as those which are dealing with web things and database things. There are about 50 or 60 templates to translate in a lot of packages, just asking you for the database administrator password. There is a very cute package named dbconfig-common, which just appeared in the archive. And all people dealing with MySQL or Postgres SQL should use it. I think I will break up here. I have tons of other slides. So yeah, I have still five minutes left. Thank you. I would first know if there are some questions in the audience. And if there are no, I will continue. Okay, there they are. Yeah, I want to know if you have a program with several short sentences that are not really understandable in the PO file, but in the context of the program, how can you manage that in the communication between the upstream and the translators? Well, how to deal with these very short and often jargonic strings we have in our program. This is an important problem, which was raised by Creti Siddle, Vietnamese translator recently. There is barely no very good solution. Or maybe there's one. Dubconf, yeah. Yeah, Dubconf supports adding notes that describes how the sentence is used. Yes, inside Dubconf. Yeah, no, I didn't mean Dubconf. PU format gets used. Yes, of course. So you can actually write notes. I can, this makes some kind of illustration about it, because I had the example, of course, for it, and. Okay, I use the I. Okay, I blocked the thing, but if I go down enough, remember this comment I showed you, it appears in the PO file, because the software author cared enough to put some comments to explain what this may mean. This is quite uncommon, of course. This is not a very easy task, because I can't request Scott Remnant to add tons and tons of explanation about what does this string mean in deep package. There are about 1,000 strings in deep package. So when some string is very, very confusing, this is my lesson number one. Please whine, please ask to the software author about commenting. Excuse me. This was a remark about deep package. Message no one understands. Well, there are a lot of them, yeah. And I'm afraid that the translation are not also understandable anyway. Franz. One problem with translations is that often, the room we served for a string, especially in GUI applications, is not long enough for translations. It may be just long enough for the English, explanation that's supposed to go there, but if you do a translation, they will often run longer than the original English text. Aptitude is a good example of that. The status bar there has room for size of the download, size of the package to be installed. The problem is if you translate that to Dutch, it will just not fit. And I think that will go for a lot of languages. And I think software authors should be a little bit more aware of that and try to have some white space that can be used up by translators. Or at least reserve enough room and do not compact too much the screens or whatever. Unfortunately, I'm not sure there's very, there are very good solutions for this. And I expect the main most important thing is probably commenting. If aptitude outer has some strings which are limited in size, it should be commented. If it's not, barely nothing, anything can happen. The same goes really for the use of abbreviations. Some things can be very nicely abbreviated in English to two letters or something like that. And most people understand what you are saying, but it may not be possible in translations to find a two-letter abbreviation that gets understood by the people in that language. One more thing that some translation, some software authors use partial strings that are not full sentences and part of sentences. Please do not do this because it may be very hard to translate this to other languages that have different structure than the English language. Yeah, don't assume that all languages are just structured as English. This is important. Don't assume there are only two ways to make rules. There are many ways. There are good, everything in get text to handle this. So don't assume there are only two ways just by adding an S. But I think that the message went quite well. I think we're running out of time. No, the way? I'm a bit lost, yeah. So thank you so much people for attending this very bad tutorial. I need two hours or three hours to do it complete. If some people are interested in man-page translation I didn't cover, please come at me and we'll look at it, the two nice slides I made. Thank you.