I18n guidelines

Internationalisation is a tricky thing, and there's a lot of things that us software developers can do that make our translators' lives either a lot harder or a lot easier.

Here are a few notes on some of the most important issues...

(There's some more great hints at MediaWiki's developer documentation, though some of them are MediaWiki-specific.)

Sentence splitting
Folks on translatewiki.net call this the "lego" phenomenon -- when text is broken up into little building blocks, the translator's life becomes much harder because sentence structure and word order can differ wildly between languages.

If you have multiple chunks in a running sentence or paragraph, try to use substitution rather than splitting it up in English order.

Bad: $this->text(_('My text and files are available under ')); $this->element('a', array('href' => common_config('license', 'url')),                          common_config('license', 'title'), _("Creative Commons Attribution 3.0")); $this->text(_(' except this private data: password, '. 'email address, IM address, and phone number.'));

Good $this->text(sprintf(_('My text and files are available under %s '.                                 'except this private data: password, '.                                  'email address, IM address, and phone number.'), $this->element('a',                                      array('href' => common_config('license', 'url')),                                       common_config('license', 'title'),                                       _("Creative Commons Attribution 3.0")));

Plurals
In English, we're used to having singular and plural forms for nouns and verbs -- "a user", "two users", "1 listener is subscribed", "23 listeners are subscribed".

You should avoid rolling your own plural selection:

$text = sprintf(($count==1)           ? _("%d listener is subscribed")            : _("%d listeners are subscribed"),          $count);

or worse, getting lazy because we "know" something will always be plural:

$text = sprintf(_("%d listeners are subscribed"), $count);

Many languages actually have more than one plural form, depending on what the count is! The translated text might need to be different for 2 listeners versus 5.

Instead, use the ngettext function; you pass both the English singular and plural forms in your source, along with the number that controls which form to use:

$text = sprintf(   ngettext("%d listener is subscribed", "%d listeners are subscribed", $count),   $count);

This will keep the plural form(s) together in the message file -- which your translators will appreciate! -- and will ensure that languages with complex plurals will be able to select the right form.

Parameter order
Keep in mind that different languages may have different word order or sentence structure, so when doing multiple parameter substitution as with sprintf it can be helpful to explicitly specify the order of parameters:

Good: sprintf(_('%1$s has invited you to join them on %2$s'),         $bestname, $sitename);

This makes it easier to tell which parameters match up to which between the source and translated string.

Consistency
It's generally best to stay consistent with usage, both of terms and tone.


 * If "users" and "people" are both used, it should be clear how they differ; if they don't differ, we should standardize on one
 * Especially for jargon-y terms like "notice", "message", etc we need to be clear and consistent. If a "notice" is public and a "message" is private, make sure it's clear from context and that we never use the wrong term
 * Things like word selection and use of contractions set tone; be consistent about for instance use of "can't", "won't" vs "cannot" "will not" etc.

Context
Sometimes the same English message text is used in different contexts where they may need to be translated differently.

If it's both clearer and more natural to make the message more explicit, it can be best to do so, with the happy side effect that the messages

When the English message source absolutely must be the same in different contexts, it's best to use a message ID which can allow them be be disambiguated. (I'm not 100% sure how to best handle this yet...)

Note also that translators will need to know what the context is to pick the right translation!