RTL support

A number of languages, primarily those written in Arabic and Hebrew scripts, are written and laid-out right-to-left instead of left-to-right as we're used to in the West.

While most modern operating systems and browsers have pretty good support for this, there are a number of gotchas which we need to watch out for as developers, and sometimes need to provide explicit support for things.

Layout
In a RTL language environment, text is laid out starting from the right edge of the window. For establishing the document direction, it's crucial to ensure that "lang" and "dir" tags are set correctly at the top level of HTML... and they may need to be overridden in other parts of the UI where block-level chunks may have a different directionality.

It's also customary to lay out the entire user interface from the right edge, mirrored from the customary Western LTR layout. This Wikipedia example shows MediaWiki's layout correctly flipped for the Arabic version:



Unfortunately doing this is complicated a bit with current HTML and CSS in use, since you tend to end up with a lot of hardcoded "left" and "right" specifiers.

If layout is controlled primarily through CSS, it can be easier to create an RTL variant style sheet using tools like cssjanus to automate the left/right flipping.

Bidi
Right-to-left text doesn't occur in isolation; left-to-right portions such as numerals and English words will usually be found embedded in the text, making it bidirectional.

The actual inline text layout is handled at the browser level based on standard bidi layout algorithms, and usually we don't have to worry about it.

Sometimes however, the usual algorithms will get confused by the way text is embedded, especially at punctuation boundaries. It may be necessary at times to add control characters or other explicit markers at boundary areas to ensure that, for instance, an Arabic message in an English environment doesn't swap the direction of surrounding English UI text. (Or vice-versa!)

Note that most punctuation characters are considered direction-neutral (eg the "(" character is "open paren" not "left paren"), and will flip around depending on their surrounding context. This tends to be extra-fragile in mixed-language environments... most of the time our notice text is set off on its own, though, so this may be limited.

Current breakage
Here's a Hebrew-language notice, shown first as it displays now on identi.ca, then corrected to proper right-to-left direction; an English approximation shows how each appears to a right-to-left reader:

(Note that each run of words has its own inherent directionality, but those runs are then laid out in an overall order to match its surrounding directionality.)

When laid out in an LTR context, the RTL parts of the text are considered to be "embedded" in the surrounding LTR text, leading to incorrect ordering between the Hebrew and English word runs and the punctuation.

With dir="rtl" set, the notice gets correctly laid out as RTL text (Hebrew sentence) with embedded LTR text (English name of a game), and the alignment and punctuation are in the right place.

Localization
Incomplete localizations can complicate testing the layout since extra English text means more bidi layout issues. Making sure that localizations are as complete as possible means devs can concentrate effort on actual danger points such as boundaries between user-supplied text and UI text.

Scenarios

 * LTR interface, LTR content - OK
 * LTR interface, RTL content - currently broken
 * there are some patches floating around that can do the flip for individual notices based on the presence of RTL chars, need testing
 * RTL interface, RTL content - should be OK if we flip the UI
 * RTL interface, LTR content - may want to flip for individual notices back

Detection of language can be a bit tricky; a user's language setting might not actually match the primary language of any given post, and text may have come from an external service where we don't have that info anyway.

We can at least detect if a notice looks mostly RTL or not based on relative proportion of RTL characters, though there may be some false results for mixed-language texts. (Note that with ASCII usernames and URLs and such, most RTL notices will contain some portion of LTR text!)

Open tickets

 * ticket 1346 automatic direction support
 * ticket 1979 Add right-to-left writing support
 * merge req 1195 RTL fix that was implemented via JavaScript done with a plugin...