[Python-3000] pep 3131 again

Jason Orendorff jason.orendorff at gmail.com
Thu May 17 19:55:57 CEST 2007


Martin, this message suggests an addition to PEP 3131.

On 5/16/07, tomer filiba <tomerfiliba at gmail.com> wrote:
> === RTL/LTR ===
> the only practical way to use RTL languages in code is to have an RTL
> programming language, where "if" is spelled "אם", "for" as "עבור",
> "in" as "בתוך", and so on, and the entire program is RTL. having code
> like --
>
> for קקי in פיפי(1,2,3)
>
> is only unreadable by all means (since the parenthesis are LTR, while
> the name is RTL, etc.)

In theory, the Right Thing to do for this is support Unicode bidi
format control characters.  Check this out:

  for קקי in פיפי‎(1,2,3):
      blort(קקי)

I just added U+200E, "LEFT-TO-RIGHT MARK", after each
misbehaving RTL identifier, as recommended here:
  http://unicode.org/reports/tr9/#Usage

Note: some mail/news agents strip out format characters.
(‮.gnikrow era sretcarahc lortnoc idib ,siht daer nac uoy fI‬‎)
(‮If you can read this, control characters were stripped/ignored.‬‎)

Now... it's clearly absurd to be pasting invisible magic characters
into source code, but that part is automatable.  Just hack your
editor to add U+200E after each run of strong-RTL characters,
except in strings and comments.  The real problems are:

1.  Many editors don't have bidi support.  This might improve
with time.  Or not.

2.  Python forbids these characters.  Martin, JavaScript
treats these specially, and I think Python probably
should, too:

The ECMAScript 3 standard for JavaScript requires the
tokenizer to throw away all Unicode format-control characters
(general category Cf).

ECMAScript 4 will likely tweak this (an incompatible change)
to retain those characters only in strings and regexps.
I like that better.

Cheers,
-j


More information about the Python-3000 mailing list