 OK. We can have a quick talk about the localization process. Some parts are specific to open office, other are generic. And how it works. Couple words about me. I'm active in several open source projects as a volunteer in my spare time. My job is completely unrelated to open office and any other open source activities. I served as project chair and release manager of open office in the past years and I'm now just an ordinary project member focusing on the Italian dictionary and on the localization process in general. Why is it a key field? Because of course a lot of new people join the project by starting with localization. It is a typical request that we receive that a volunteer would be able in some undefined way to help the project by contributing to localization and not starting with development or something else. Not starting with documentation because apparently that's still problematic in some respects but usually they want just to see open office translated into their language. There are a lot of political factors behind this and other reasoning that can be really motivating for a new volunteer. And actually adding a language to open office doesn't mean anything in itself. Since adding a language means a lot of different things. There are document languages, there are interface languages and in the open office case there are the active putol languages. Each of them is different. The document languages of course is the most official part. If I want to have open office support a new language then we refer to international codes and standards. Not real standards. For Microsoft office compatibility we have times to just consider that the Microsoft mapping of languages has an internal standard for better interoperability and the same for adding new languages that can work the same way between say open office and liberal office. So this entails a bit of paperwork. It's rare that a volunteer is well prepared about this but this is the first basic step with a few source code file changes and at the end open office knows that a language exists. So this list is used for example if you go to format character font and you have a list of all available languages for your installation and for the spell check meaning theoretically supporting a spell checker for that language. Otherwise it will not work at all. You need to bind the spell checker to a language and this is the basic step. Unless your office suite supports the language you will not be able to use it or you will have to do horrible hacks that fortunately we don't have the time to cover here. Then there are the interface languages. Interface languages means of course having open office translated into your language and in some very rare cases mostly of them say politically motivated people want to translate open office into a language even if it is not officially recognized as a language or even if our support of standards is insufficient to that aim. In this case we do have a couple of languages that are only interface languages meaning that they cannot write a document in the same language or they cannot have a proper spell checker in the same language and they have to abuse another spell checker. That's the idea basically. You create an Italian dictionary but you put Catalan words into it and you say it's a Catalan dictionary. It works like a Catalan dictionary but if you then open your document on another installation it will be interpreted as a document written in Italian. So things get very complex otherwise this is just a normal second step. We know the language, we support it in the interface. Then there is this third step for open office since out of about 120 interface languages that do have a partial translation and that are in our source code we release about 41. About meaning that there is one special language that is called the key ID language that is just a fake artificial language used to give a unique identifier to every string so you get English plus a number so that your open in file open can be distinguished by the word open appearing somewhere else in the interface. The first one gets annotated as say open123 and the other one will be open 234. Putl, our web translation service supports this 41 and a few more and the other partial translations are in the source code but were never imported into Putl and have been not imported yet. So when we have a new volunteer for language X say if X is in Putl everything is very easy and within 24 hours the guy will be able to actually start working. Otherwise we have to create the language in Putl and this will be handled separately since we need to have a very quick overview of tools. There are too many and probably too bad for the purpose but this is the current situation. So Putl is the user facing interface just to us just a way to collaborate on online editing of PO files. We don't use any exotic features just this is just the basic purpose of Putl. We do use some common line tools on the Putl server that help with management. Some teams actually prefer to work offline. They just download their PO files they could do without Putl completely. This is fine so long as everyone in the team is willing to do that. In the case of the Apache Software Foundation we have one Putl installation for all projects OpenOffice is only one of hundreds of projects in Apache. Commiters can log in directly from any project other volunteers need an account created but this is the only overhead we have. So the Putl part is the nice part that users and volunteers see but behind that there is this obscure SDF format which is a simple, very simple actually it is just a set of fields separated by tub which is fine as long as you have three columns of data but when you have dozens of columns of data it's just impractical and it is by the build system it is specific to OpenOffice and to the historic tradition of the OpenOffice code to be precise and one interface translation is one SDF file so basically the Italian translation is one file that can be found offline if you'd like to crash your browser here and it is 2 megabytes and about 73,000 lines or strings. It is highly impractical for the editor and for anything else even though it is theoretically possible to go there edit one string, save and rebuild OpenOffice and it will work but as we will see it will not last long since it will get overwritten. Let's just see the history of a string as a practical example the full process to get a string translated. Of course you need to mark content as translatable and I will skip some details but basically you have resource files and you should use resource files here to define a string so this is the menu string saying convert text to table and one first major issue this is the English string so English is written by programmers who are often not native speakers we had native English speakers come to us in the past years and say that some expressions were like the language of a lawyer from the 18th century but it was just what the not native programmer had written in code. Second step, you extract these localizable strings and basically you make a fake translation that is the English reference a template to be expected so one huge SDF file is for the English language and it is extracted directly from resource files. The third step from the huge files you generate the PO files that are the say industry standard for translation or one of the most common formats for translation and then you upload them to Butor. Then you actually start translating and do the translation the only part that a volunteer is looking at normally and then again you go back and you go all the way back so from the PO files you go back to the SDF files and then you rebuild open office and at that point it will pick up the new translations. We have two, three minutes still two, three minutes for describing possible improvements and it's clear so far. First, it is a process that can break in too many points. Second, we do have current issues at the moment for example the last put-all update was done a long time ago so we are unsure of what translator are actually translating at the moment. They will be mostly good because there is a delta with respect to trunk that has different resource files that would need a full update. There is some tool to check validity but you can get a broken build if you have an invalid SDF file and fixing it is slow and painful and put-all can do something to do some basic checks but they are not really reliable. One other issue is that creating a new language from the current trunk can be done but we would need to synchronize all languages first. How can we get out of this? Possibly for the future once it is fixed with the current way of working possibly one major step in the future is to just get rid of some intermediate formats. Other projects have already done it and the intermediate format is very error prone it could just be skipped and we could settle on one standard format and just use it and try to support it as much as possible. There are many other improvements like creating an internal language that is the programmer's English as they call it and get a real English translation out of it so that changing a type in English does not require updating of all other languages. Then of course one thing that we will need to work on is more automation. One of the projects started by anniversary years ago and now never integrated and not inactive development at the moment it was called GenLang and it had many workflow advantages in its design and the interesting approach is that many of the steps that I've described as manual so far can be automated and it makes sense to try to automate them as much as possible especially because our tools are not very robust to give you a stupid example when you create the first extraction here if you don't give it a full path the tool will break silently. There's no reason for it at all of course it's just a buggy system that we checked and polished a bit so the more we can automate and refine the tools the best for the future. Just in time, thank you and I'll let Peter go on but I will be available for any other questions or while we switch presenter if better.