 Okay, hi everyone, sorry for the short delay and today we will listen to talk about continuous localization, a very important topic and we have Dwayne and Ryan to talk about it and what can I say, Dwayne has been involved with open source since 2001 and worked with these brutal and translational kids and Ryan works with him and let's give them a round of applause. Thank you. Thanks very much for having us. So I'm Dwayne, I work with Translator House and I'm working on Poodle for quite a while. I'm kind of one of the founding creators of the software and Ryan is a lead developer at the moment and our focus today is looking at continuous localization both in terms of how it is able to be done and maybe some of the things that we need to take forward in terms of open source. So just a little bit of background in terms of history of us as an organization and what we've been involved in. I'm from South Africa and I started working on the official languages of South Africa. So translating open source software basically because it was the most powerful vehicle we had to influence software and to change the way software worked. In fact it was quite a critical thing to get. Other people like Microsoft are actually translating software by translating something like Open Office. That moved into an Africa-wide network looking at issues of localization across the continent and at the moment we focus on worldwide issues and trying to hold teams that are localizing across the world. The reason why we got into tools and tooling was really to address some of the shortcomings we saw in open source. So every tool that we developed was trying to tackle something. Purtle specifically was designed because we were struggling to get good tools into the hands of localizers. But we built things like the Translate Toolkit which looks at quality checks and does format conversion. So that was also about how to enable localizers. We have a tool called FOTAL which is an offline translation tool. And then Amagama which is a massive free and open source software translation memory. 2004 was when we launched Purtle and my daughter was young there. And now Purtle is a team intern so it's my daughter so it's grown up, it's stronger but it's a little bit more opinionated now as well. So at the time I've been involved in localization and I'm a bit of an accidental localizer. My training is in mechanical engineering and I've been involved in localization because of the impact it had on people. But we've seen some of that process and I'm not a localizer from the 80s but really the systems that we've had in place and some projects are still active. So it's a matter of getting work via email, translating it, sending it back to review, sending it back to be fixed, landing it and then a few months later actually releasing it. But things have changed over the years. Some of the things that we saw the first translation of Microsoft software into South African language would never actually be translated again so there were mistakes and they wouldn't be fixed. They would just roll over to the next. And that's very much an old school view that people have moved out of. In a space where people were translating 10, 15, if you were a rapidly progressive organization you'd probably be doing 35 languages. But the first thing we start seeing, we see it happen even with Ubuntu, just exposing translations to other people and making it a little bit divorced from the actual release, we see Microsoft started doing language pack concepts and getting that out and we see quite quickly Facebook and Twitter doing massive community based localizations and we're looking at things like hundreds of languages. So in the present day we are looking at translation processes that need to deal with lots and lots of languages and that's the reality we are in the world changing. We're having to deliver rapidly and we think about mobile and web apps with multiple different targets. You've got new markets, 20% of the web are native English speakers and the rest are growing. All the languages are already growing in terms of multi-digit per cent every year or things like Russian, Chinese, Spanish is counted there as well. English is growing but not as big as these other languages and more and more people are coming on board with mobile. In fact I think it's 2013 was the year in which more people were using mobile to access the web than we're using desktops. So a very dynamic change and a very dynamic change in the environment and that's kind of why we're having to look at continuous localization certainly in the open source space. I just wanted to quickly valorize localizers because I don't think we need to spend enough time focusing on the effort that they put in and why we want to look at continuous localization from that to assist them. Often localization processes are really built around the developers. They're built to support the development process and it's time that we honor the people who were really, really inspired across often multiple projects, multiple formats, multiple tools, et cetera, et cetera, et cetera. A kind of environment that developers are not really, really used to. People are really, really passionate about advancing their language and so that's some of the motivation of why continuous localization is critical because it's about how do we make these people really, really effective. Some of the processes that we've seen over the years that we want to try to look at fixing and we think continuous localization is part of what fits us. Strangely enough if you're looking at continuous localization as things moving more quickly really feel that some of the way you can approach continuous localization will actually effectively reduce the load on translators. Really focus on which things are more important than other things providing much more context in a way of translators getting context that they might not usually do. Having a strong focus on quality in the same way that continuous integration looks at checks. Having really quick processes that can allow strings to be turned around very, very quickly and actually breaking the concept which I think a lot of people have voted for too long of string freezes that there's a deadline at which you can't change things but actually creating a system that is much, much more dynamic. The key ideas for us in terms of continuous localization is really trying to look at reducing the friction in the process. So which of the processes that require manual intervention or intervention that is not helpful for the localization process? How do you oil those well? How do you eliminate them? How do you reduce the cost of them? Obviously reducing and eliminating manual processes because they become blockers in terms of process. Organizational memory is a critical thing and Ryan will talk much more to it because of how do we learn and how do we continue to learn within this phase of translation. It's quite critical in open source because we have a flow and movement of translators all the time. The visibility of progress so people can see what they're working on and what they should be working on. The critical thing for me as well is how do we empower the right people expecting developers to make comments about the strings that need to translate where a translator might have a much better context and a much better idea of how that impacts their language. I'd rather empower the translator in that context. The other thing that we see continuous localization helping is the idea of releasing early and often. Working on processes that are asked to push languages out as quickly as they are already and to do that continually which is more reflection of the reality. I just wanted to quickly leave into things that we feel that people should be doing right now if you're not doing anything around continuous localization. The first much critical thing and we see it too often is that bad translations and broken translations can break products. We have a strong belief that translations, we want to catch those problems earlier but as a first step, nothing that you send out as a product if it's mistranslated in some way should be able to break the product. I mean actually physically breaking it but you get a trace back or a crash and that happens too often and we're still working in a space sometimes in open source where we allow that to happen. So we need to break that. If you are currently using continuous integration and I doubt there's very many people who don't do that it's kind of critical that you're starting to look at localization being part of that catching localization related areas whether it's breakages in the translations that you can actually automate and detect whether it's extraction of strings that you haven't done those kind of things need to be put into your process right now. And I'd recommend people start looking at one of our tools which is the Translate Toolkit and actually start looking at how you can automate some of that in terms of the build environment so that you're not actually allowing translations that break products to enter into the build and never get out. I'm going to hand over to Ryan who's going to take stuff and look at it much more from a continuous integration model in terms of what you're trying to do. Yeah hi, so we've worked with software for years both by myself and one thing we saw very clearly was that using continuous integration techniques allowed us to both improve the quality of the software and speed up that process. And so the big question for us really was can we apply those processes to localization? And I think a similar question really that we've asked is are there any processes in localization specific to that that could also assist in terms of continuous integration and continuous localization? So within localization obviously we have quality checks and as Ryan mentioned just previously Translate Toolkit provides many such checks. I think the ultimate aim is to prevent the strings ever reaching the build process so that they don't break builds and they don't break the build product. But I think with continuous integration it's not just about fixing bugs it's about taking the process so it's about stepping back and saying you know what was the cause of that bug and putting in place the process to prevent that but even further taking a further step back and saying what causes that class of bugs and how can we prevent those that class of bugs from happening? Within localization is obviously already the quality checks but that also provides methods it's not simply about whether or not a build can pass or fail it's also about whether or not the developers and localizers have sight of change in the process. So I think one of the things we've been really looking at recently is really two different sets of checks and we use GitHub, I think other repositories have similar mechanisms but it can show you basically if your continuous integration tests are failing or other metrics about any change that you're proposing and really what we'd like to do is to have something similar so when developers make commits to the branch they can see this has added so many strings the translators can see that those strings have been added straight away and then on the flip side when the translations are being committed to the repository the checks that are currently happening in the translation environment could also potentially be happening at that point of change So another key aspect is improving the communication for localizers, one of the most challenging parts of their job is really getting the context of what that string is used for the same string obviously, or the same words could mean quite different things in different contexts So for translators that takes up a large amount of their time trying to get the context to give an appropriate translation So one thing we'd really like to look at is how we can improve that communication flow between the developers and then conversing back from the developers to the translators But there's also a lot within a language team that could be improved The kind of checks that people develop are quite often specific to their language but quite often they're not and they work with other languages as well So really what we're looking at is how we can increase communication within language teams, between language teams and with language teams to developers So just taking a quick step back 10, 15 years ago developing software if a project was forked it was really seen as the absolute worst thing that could happen to a project pretty much the death knell of a project and GitHub came along and basically started encouraging people to fork and they kind of put this ribbon on people put this ribbon on the website saying you know fork me on GitHub and that really changed the way in which people saw software forked Nowadays most projects will look on their GitHub page or what have you and they'll regard it as a success metric that more people are forking their software quite the opposite to what happened in the past In part that changes come about just because of distributed versioning it allows you to have local branches more easily push changes between those local branches but also have shared upstream or even potentially multiple shared upstreams I think the real change came not with distributed versioning it came with the pool request because the pool request allowed you to not only fork but to offer those changes back to make it very clear what changes you were offering back to the core branch So I think for localization to adopt some of those practices really the area that we're looking at right now is how we can branch and how we can dip and how we can merge those changes back how we can apply that very successful process from software into the localization process So I mean we've developed POOTL but I know that others are looking at similar technologies I think a pretty critical part to making it frictionless is that BCS hookup so as soon as developers are committing strings they're appearing in whatever tool that you use to localize and conversely as soon as you're making changes or soon after you're making changes to your localization those are appearing back in the repository POOTL started out mostly as a file based editor and even now even very large complex projects tend to have a very file based view of localization and one of the things that we're moving towards is much more seeing localization as a data source so we can work not just with profiles or what have you but directly with databases or potentially even with websites or other sources of localization So as I said previously we worked mostly with POOTL nowadays the localization landscape has dramatically changed and even within a single project you're likely to see more than one localization format so that's a pretty critical aspect being able to integrate with those projects is being able to work with the variety of formats that they work with So there are some technologies that are quite specific to localization that are pretty relevant in terms of building that organizational memory I think translation memory being probably the most obvious one for those of you who aren't aware it allows you to see how a string was translated previously and gives you some prompts in how you might translate it The other one that's a pretty significant aspect and very much increasingly so is machine translation Historically machine translation has been pretty poor that's changing pretty quickly but it's also useful as an aid to speed up that localization process even if it doesn't give you a perfect string it can give you some pretty good prompts So just finishing up within Translate House our main goal around this is to support more free and open source projects we're very much from the free and open source world and I think for most of the new organization that's our main aim is to see as much open source software localized in as many languages as possible but we'd like to make localization a given I mean walking around the conference talking to people even now many big projects just don't localize their software and I think nowadays the technology is there that makes it a lot easier and the internet has meant that the market is so much greater so the need to do that is so much more So five things just to say we're looking for any well we're always looking for localizers but we're also looking for coders who might have a passion around localization or widening participation in the internet and the web more generally So any questions? Come up with questions Hi So if you plan to use like a unified storage system for localization how would you put it together given that different localization formats have different feature sets so some may have plural, some may not have plural some may like L20 and has a lot of extra features so how would you do that in a unified storage model? Well, I was thinking that I was thinking that specifically in terms of Prutu currently Prutu still uses an internal representation that's pretty much like Poe which gives it some limitations but Poe is a pretty robust format so you can express most of the other formats in that way but I would say, for us the answer to that question is Prutu Prutu is where we bring localizations from different projects from different formats and we put them into one place where localizers can see them in one interface I think the critical thing as well in terms of comparison and dipping and that kind of stuff the needs of localization in terms of the dip the camera of the dip, we don't really want to see how the layout or the formatting changed we want to see the content of the string that changed and that's really what we're wanting to see dipped in terms of the differences that are coming from different places and work at hard notes Talking about teams and building up with localizators what would you suggest someone showing up with a rare language that's not in the local database no GNP support or anything a new language do you recommend they will start translating the profiles or go to the source first so that they get support for the basis My kind of view is not to discourage anyone Any questions? My passion is to see even the smallest language active and activated on a Linux system not having a DLMC locale can be a bit of a barrier but a lot of places it isn't In the African context we tackle the missing locale problem by just making it go away and we address that we built a hundred locale I would encourage someone like that to look at building a locale I haven't looked at it for a while whether the CLBR data is called very interesting that would probably be the most effective way of actually doing it they're not very hard to do if you basically read South African English because I documented it all but my thing would be I think the critical takeaway for me is I don't like seeing any barriers prevent someone I'd rather help someone like that tackle the DLMC locale thing because it will go away once that's done it's literally gone and then they're free to carry on I work together with the AstrodOS team to manage translations there and one of the troubles we're running into is that we have the applications which are all QT so we have the QTS translation system for all the strings in the application and then we have the desktop files which contain the app name that also needs to be translated we're currently using Weblate for that but we can only translate the in-app strings and we have to try to find a way to personally contact the translators to get the desktop entries translated do you have any tips on how to handle situations like that? Yeah, I'm right so you've got one part that's able to be translated in Weblate and one part that's not Yeah, the QT part is it can be translated in Weblate and the rest is a .desktop file which Weblate doesn't support in tube contents Yeah, that's fun so I mean I would look at the known tool which allows you to translate the .desktop files and that would convert it to PR which might be an approach because I'm pretty sure then you could translate it in PR format Okay, thanks Other questions? You were talking about quality of translation earlier what's your process to ensure that you have good quality translation because presumably different translators in the same language will have different will translate differently So especially when you're dealing with quality of translators the metrics that I'm looking at is making sure that you've got depending on the maturity of the language I'm looking at you've got the resources that enhance translation so well-defined terminology followed by translation memory that would be how I try to look at consistency The thing that we want to see language teams specifically do is actually build quality metric tests that can actually ensure that but the credible thing we've found is that actually before translation memory is to get consistent terminology that relates to the domain that they're translating and that's the kind of first step of getting of getting a consistency Part of the answer to that is about type as well because somebody can translate something to the continuous localization because of the oversight training to the localization So as a French web developer one of the recurring issues that we face is that when we translate English text it fills a tiny button we translate it in French and then it overflows very often because French words are often longer than English words Do you have any suggestions on how to handle that more systematically? Graphical screenshots or anything? There are a few points I wanted to not have a UI that allows you to translate because I think the reality with translations is a common problem All translations inflate from the original so they will get bigger like 20%, 10% depends on the language There are a few approaches One is to physically place limits on things and put checks in place that limit that The problem with that is that you end up with very difficult to understand translation So my first thing would be actually looking at how you can adapt the UI to see if that's the first possibility In terms of screenshots and that we've been toying with a bit of those ideas and thinking what we could do there The difficult with screenshots is trying to figure out how you could automate that a bit so that you can see where things are flowing and I'm not sure how to do that It's also worth mentioning in place localization it's not something that Pooja does at the moment it's something that we've certainly considered developing but within place localization systems you really go to the web page and you translate it into the web page so then the localization system is made available but pretty much any localizer I've ever spoke to would say that that's not what we want you to make in the UI that's going to be available for our translation on the other way around Any other questions? I think we have time Keep your hand up Thanks I have a question about a feature that is usually available in translation software It's about terminology which wasn't mentioned there So What is your thoughts about the extensive uses of the terminology in order to view the quality of translations? I think it depends a little bit on what's being translated Last year I was in India and working with some government organizations there and the point they made is that without the terminology even within a single language the same word because it's relatively new word may be translated many different ways and for an organization as big as the Indian government they're dealing with I think there were eight official large languages and many many dialects So I think terminology provides particularly for newer terms but it's just standardizing that and it provides being translated under different ways So I think it's very cool I think addressing it in a continuous localization thing for me is I think often terminology is neglected but it's really empowering the translators that would be my passion is seeing the translators actually developing the terminology that applies to the domain that they're localizing So the words they have problems with that becomes the terminology Hi there You were talking about as a part of continuous integration it's a good idea to try to detect broken strings for example How would you concretely go about that Does that mean you create a build of your software and then you have to kind of start it up and run through a bunch of acceptance tests to see that the software doesn't crash for example because of a weird string or do you do screen-shotting or this kind of thing or what's the approach there So the question is how do you prevent strings from breaking builds and how do you replace those processes I think there's not a single answer in PUTL you can translate things and it will show where there's critical errors One of the things that we've talked about quite a lot we'd like to add is the ability to have something a bit like a pull request so when somebody makes a bunch of changes you could see that that could create a critical change, a critical fails and the critical fails tend to be the kind of things that will break a build Ideally they wouldn't get into your master translations Ideally they would be reviewed and it's kind of that review process that would be preventive and much like with a pull request it goes off to Travis and it fails people won't plan it but I think there's a second part to that which we don't have yet but something that would be very nice to have a pull request made to push the translation strings those checks repeated against the developers can see that coming in to their repository I mean just to raise one of those I mean we I think we actually unpicked it from my app but it's an approach that I would do it's kind of like if you're looking at things as like walls of defence the one was if a string does fail to be translated so either you or something happens when you try to get the string and my app is using get text so it's that approach forward fails to interpolate so the variables are not translated or broken in some way and we fall back to English so English is not ideal in the UI but it's better than crashing or breaking so that would be my first sign of defence that our things with a continuous localization is trying to bring that further forward and using get text itself to see if the file compiles and that you're not pushing it out of the compiled file that's broken which is quite possible in certain contexts when you're not necessarily using get text directly and then once they talk it we'll do things that we actually look for variables and we can write tests that look for variables that are broken or XML that's broken and XML becomes quite a problematic one where you're actually doing some kind of interpretation on it, you're expecting the XML to look a certain way you've used names that look evidently translatable about page or color and people then translated so just building those checks so that they're caught right at the beginning so I think that the critical thing involves is that you and the problem with some of the automated checks is that they do need to be 100% accurate so you don't want you can't really deal with false positives but enough checks that you can build to catch those right there and that prevents them from ever even going out Thanks Hi sort of related to the same question previously so if we already have say a test suite that we put the localizations through when they come in back from Poodle or TransFX or whatever we might be using at the time would there be would you envision some sort of mechanism or protocol whereby you could integrate your own systems to report back to maybe Poodle if say the translators will make 20 changes on 20 strength and somehow that magically in the background creates a Poodle request then the source control CTI tools will come back and report and say these two strengths here failed with this error message and stuff like that and would you think of some integration tools to sort of limit the going back and forth and like helping translate things quicker So the question is if we've got any things in place that would stop it at Poodle request or the changes I think whether or not we could get that communication back into Poodle or into other tools I mean I don't see why not in some respects I would say but certainly in terms of how we're thinking this at the moment if it's failing in Poodle it's going to weigh it on landing it's going to run the same check so in Poodle they would have that communication already in terms of TransFX or other online platforms so basically we could expose some kind of API but I'm not sure we haven't really thought that far yet I think from my side like the critical place to catch it is right in the tool itself that would be my ideal like reporting back at a later date works but it does mean that the translator then is out of the loop they're not on that string anymore so the ideal would be to catch it earlier Let's see any other questions Let's save it