[sldev] Looking at I18N formatting standards

Philippe Bossut (Merov Linden) merov at lindenlab.com
Fri Feb 20 15:51:24 PST 2009


Hi,

Aaaah! Plural, gender, etc... Right. As you know, ICU provides a  
service for plural form (http://icu-project.org/userguide/formatMessages.html#CF 
). Since your example for message formatting comes from the same page,  
I suppose you already know about that. :)

gettext also covers the issues, even including cases where you have  
more than 2 forms for plurality (e.g. some languages like arabic have  
a "dual" used for "2 things"). So it's more complicated than just  
"one" and "many".

In general, for forms that depend of the cardinal of a value, the best  
is to have a selector in the code and a plurality of strings: 2, 3 or  
more depending of the courage of the coder and the criticality of the  
translation.

For gender, it's even more complex as most languages have 2 or 3  
genders (adding a "neutral" like in german). And then, you have the  
combination of gender and plural in case the language have special  
grammatical link rules (like French). For instance, a message like:
"%(number)d %(thing)s have been selected" will require no less than 4  
variations since "selected" will have to be written "sélectionné",  
"sélectionnés", "sélectionnée" or "sélectionnées" depending on  
(number) and the gender of (thing). That beats your "nuevo/nueva"  
example. Would have you guessed that "selected" was influenced by the  
arguments (number) and (thing)? And French is easy compared to German  
or Baltic or Slavic languages. Not to mention Asian languages...

Even better, in your "nuevo/nueva" example, imagine that, in French,  
you have 3 forms "nouvel/nouveau/nouvelle". "nouvel" is used if  
[ITEMTYPE] is masculine and its first letter is a vowel. e.g.:
- un nouvel objet
- un nouveau programme

Aaaagh...

Frankly, I don't think it's possible to spell all the possibilities  
for all possible languages. The best I think is to use caution when  
creating the strings. Some rules of thumb:
- never concatenate strings: that's just plain evil as it assumes some  
general grammar
- use as little arguments as possible in any string: because of the  
combinatory explosions here above mentioned and because it's  
commensurate to string concatenation
- when using arguments, simply "print" the argument separating it with  
punctuation from the sentence (i.e. don't make it part of the  
grammatical meaning) e.g. "Number of files: [NUMBER, integer]"
- use multiple strings and a selector when stuck: don't try to be too  
smart creating complex argument lists

Then, it's up to the translator to be a little smart and use his/her  
native language artfully to avoid the pitfalls. In your example for  
instance, here's what I'd do in French:

"Vous avez reçu un '[ITEMTYPE]' de la part de '[NAME]' il y a  
[DAYS,integer] jour(s)"

At least, that's how I'll translate that into French if I had to :)

Cheers,
- Merov


On Feb 20, 2009, at 2:16 PM, Steve Bennetts (Steve Linden) wrote:

> Great feedback, thanks!
>
> One other issue I've been thinking about: how to handle  
> pluralization and gender. For example:
>
> "[NAME] gave you a new [ITEMTYPE] [DAYS,integer] days ago."
>
> There are 3 potential problems here:
> 1. [NAME] gave - 'gave' might vary based on gender or familiarity.  
> This is pretty much impossible to solve since there is no practical  
> way to know the gender or relationship of, say, "M Linden".
>
> 2. new [ITEMTYPE] - 'new' might vary based the gender of ITEMTYPE.  
> In this case we could specify the gender, since ITEMTYPE is  
> presumably in a localized table somewhere. We could do something  
> like '[ITEMTYPE] [nuevo|nueva,gender(ITEMTYPE)]'. Has anyone seen  
> anything like this before?
>
> 3. [DAYS,integer] days - we see this problem in English all the time  
> "1 days ago". We could do something similar to the above example:  
> '[DAYS,integer] [day|days,plural(DAYS)]. Again, any good references  
> for this sort of thing?
>
> Thanks,
> -Steve
>
>
>
> Philippe Bossut (Merov Linden) wrote:
>>
>> Hi,
>>
>> As someone who did i18n/l10n in a former project (and even did  
>> translations from English to French...), here's my comments on this  
>> subject:
>>
>> i18n (internationalization):
>> On Feb 17, 2009, at 11:48 AM, Steve Linden wrote:
>>> The I18N dev team is going to be tackling date, time, number, and  
>>> currency localization issues in the next couple of quarters. We  
>>> are looking at existing standards for replacing text inside a  
>>> message and want to cover as many as possible before making a  
>>> decision. Some possibilities that we are looking at include ICU  
>>> and XSLT. If anyone on this list is familiar with any other good  
>>> options, please reply to this thread.
>>
>> - ICU is great! It uses the Olson tables for date/time locale and  
>> Time zone sensitive formating. Time zone support in particular can  
>> be mind blowing. Don't underestimate this and think you can do your  
>> own home brew "simple" version: TZ support is anything but  
>> simple... ICU is by far the best here.
>> - Make sure you support primary and secondary locales as lots of  
>> people use 2 (a primary and a fallback).
>> - Make sure you support the country flavors (e.g. fr_CA, fr_BE,  
>> etc...). Beware of its infuence in data formating (use of "."  
>> instead of "," for decimal separator for instance)
>> - You didn't mention "sorting" in your list. That's also a service  
>> provided by ICU and should be used when presenting lists to users  
>> (and we've plenty of this in SL)
>> - There's also a Python version of ICU (PyICU) which can prove  
>> useful considering we've quite a bit of Python code floating around  
>> (though none with user facing strings... yet...)
>> - What about providing l10n for LSL? (/me ducks...) Seriously,  
>> that'd be really cool...
>>
>> l10n (localization):
>>> I am not particularly fond of indexed substitutions, I prefer name/ 
>>> value pairs, because it gives the translator a little more  
>>> context, i.e. it is easier for a translator to look at "At [TIME]  
>>> on [DATE], there was [EVENT] on planet [PLANET]" then "At {1,time}  
>>> on {1,date}, there was {2} on planet{0,number,integer}."
>>>
>>> Our current compromise proposal would look something like this:
>>>
>>> std::string bar(const LLSD& sdargs)
>>> {
>>>     LLUIString foo = getString("bar"); // bar = "At [DATE,time] on  
>>> [DATE,date], there was [EVENT] on planet [PLANET,integer]";
>>>     foo.setLLSDArgs(sdargs);
>>>     return foo.getString();
>>> }
>>
>> +1 on (name/value) pairs in the code and big -1 on indexed  
>> substitutions. As a localizer, the less guess work I have to do  
>> about the context of a string, the faster I can get a translation  
>> out. I don't really care about the format that much and your  
>> example could easily be reordered in French as:
>>  "[EVENT] a eu lieu sur [PLANET,integer] le [DATE,date] �  
>> [DATE,time]"
>>
>> If you think however to localize Python scripts also, you may want  
>> to use Python syntax though rather than your own, i.e.:
>>  "At %(time)s on %(date)s, there was an %(event)s on planet % 
>> (planet)d"
>>
>> But, heck, again, I've no religion here.
>>
>> One question: which translation tool will be available to  
>> translators? I used poedit in the past (http://www.poedit.net/) and  
>> it's pretty handy. That also opens the door for sldev community  
>> members to participate in the localization process. Of course, that  
>> supposes that there's a tool to convert SL resources to the .po  
>> format and back. Any plan for doing this?
>>
>> Cheers,
>> - Merov
>>
>>
>>
>> _______________________________________________
>> Policies and (un)subscribe information available here:
>> http://wiki.secondlife.com/wiki/SLDev
>> Please read the policies before posting to keep unmoderated posting  
>> privileges

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.secondlife.com/pipermail/sldev/attachments/20090220/1790a3bb/attachment-0001.htm


More information about the SLDev mailing list