i18n guidelines
Internationalisation is a tricky thing, and there's a lot of things that us software developers can do that make our translators' lives either a lot harder or a lot easier.
Here are a few notes on some of the most important issues...
(There's some more great hints at MediaWiki's developer documentation, though some of them are MediaWiki-specific.)
Contents |
[edit] Message formatting
[edit] Sentence splitting
Folks on TranslateWiki call this the "lego" phenomenon -- when text is broken up into little building blocks, the translator's life becomes much harder because sentence structure and word order can differ wildly between languages.
If you have multiple chunks in a running sentence or paragraph, try to use substitution rather than splitting it up in English order.
Bad:
$this->text(_('My text and files are available under '));
$this->element('a', array('href' => common_config('license', 'url')),
common_config('license', 'title'), _("Creative Commons Attribution 3.0"));
$this->text(_(' except this private data: password, '.
'email address, IM address, and phone number.'));
Good?
$this->text(sprintf(_('My text and files are available under %s '.
'except this private data: password, '.
'email address, IM address, and phone number.'),
$this->element('a',
array('href' => common_config('license', 'url')),
common_config('license', 'title'),
_("Creative Commons Attribution 3.0")));
We have a lot of these currently that need cleanup - bug 534.
[edit] Plurals
In English, we're used to having singular and plural forms for nouns and verbs -- "a user", "two users", "1 listener is subscribed", "23 listeners are subscribed".
You should avoid rolling your own plural selection:
$text = sprintf(($count==1)
? _("%d listener is subscribed")
: _("%d listeners are subscribed"),
$count);
or worse, getting lazy because we "know" something will always be plural:
$text = sprintf(_("%d listeners are subscribed"), $count);
Many languages actually have more than one plural form, depending on what the count is! The translated text might need to be different for 2 listeners versus 5.
Instead, use the ngettext function; you pass both the English singular and plural forms in your source, along with the number that controls which form to use:
$text = sprintf(
ngettext("%d listener is subscribed",
"%d listeners are subscribed",
$count),
$count);
This will keep the plural form(s) together in the message file -- which your translators will appreciate! -- and will ensure that languages with complex plurals will be able to select the right form.
We have like NO usage of ngettext yet! Need though to double-check these are cleanly handled with the TranslateWiki interface, but the rumor is it's supposed to be handled.
[edit] Parameter order
Keep in mind that different languages may have different word order or sentence structure, so when doing multiple parameter substitution as with sprintf() it can be helpful to explicitly specify the order of parameters:
Good:
sprintf(_('%1$s has invited you to join them on %2$s'),
$bestname, $sitename);
This makes it easier to tell which parameters match up to which between the source and translated string.
[edit] Terminology
[edit] Consistency
It's generally best to stay consistent with usage, both of terms and tone.
- If "users" and "people" are both used, it should be clear how they differ; if they don't differ, we should standardize on one
- Especially for jargon-y terms like "notice", "message", etc we need to be clear and consistent. If a "notice" is public and a "message" is private, make sure it's clear from context and that we never use the wrong term
- Things like word selection and use of contractions set tone; be consistent about for instance use of "can't", "won't" vs "cannot" "will not" etc.
[edit] Context
Sometimes the same English message text is used in different contexts where they may need to be translated differently.
If it's both clearer and more natural to make the message more explicit, it can be best to do so, with the happy side effect that the messages
When the English message source absolutely must be the same in different contexts, it's best to use a message ID which can allow them be be disambiguated. (I'm not 100% sure how to best handle this yet...)
Note also that translators will need to know what the context is to pick the right translation! Within the TranslateWiki interface you can add comments to clarify things. (We also want to make sure that source code references in the .po files are available on TranslateWiki notes, and preferably find a way to help automate showing UI context visually.)
Update 2009-11-18: Adding support for pgettext / npgettext to add a context parameter, can start using these...
ngettext('tab', 'Login'); // Possibly a noun?
ngettext('button', 'Login'); // An action -- verb!