[Python-3000] pep 3131 again
Jason Orendorff
jason.orendorff at gmail.com
Thu May 17 19:55:57 CEST 2007
Martin, this message suggests an addition to PEP 3131.
On 5/16/07, tomer filiba <tomerfiliba at gmail.com> wrote:
> === RTL/LTR ===
> the only practical way to use RTL languages in code is to have an RTL
> programming language, where "if" is spelled "אם", "for" as "עבור",
> "in" as "בתוך", and so on, and the entire program is RTL. having code
> like --
>
> for קקי in פיפי(1,2,3)
>
> is only unreadable by all means (since the parenthesis are LTR, while
> the name is RTL, etc.)
In theory, the Right Thing to do for this is support Unicode bidi
format control characters. Check this out:
for קקי in פיפי(1,2,3):
blort(קקי)
I just added U+200E, "LEFT-TO-RIGHT MARK", after each
misbehaving RTL identifier, as recommended here:
http://unicode.org/reports/tr9/#Usage
Note: some mail/news agents strip out format characters.
(.gnikrow era sretcarahc lortnoc idib ,siht daer nac uoy fI)
(If you can read this, control characters were stripped/ignored.)
Now... it's clearly absurd to be pasting invisible magic characters
into source code, but that part is automatable. Just hack your
editor to add U+200E after each run of strong-RTL characters,
except in strings and comments. The real problems are:
1. Many editors don't have bidi support. This might improve
with time. Or not.
2. Python forbids these characters. Martin, JavaScript
treats these specially, and I think Python probably
should, too:
The ECMAScript 3 standard for JavaScript requires the
tokenizer to throw away all Unicode format-control characters
(general category Cf).
ECMAScript 4 will likely tweak this (an incompatible change)
to retain those characters only in strings and regexps.
I like that better.
Cheers,
-j
More information about the Python-3000
mailing list