Re: [Python-Dev] Python and the Unicode Character Database

Dec. 2, 2010

      Le jeudi 02 décembre 2010 à 13:14 -0500, Alexander Belopolsky a écrit :
...
...
I don't understand why you think Arabic or Hebrew text is any different
from Western text. Surely right-to-left isn't more conceptually
complicated than left-to-right, is it?
No, but a mix of LTR and RTL is certainly more difficult that either
of the two.  I invite you to digest Unicode Standard Annex #9 before
we continue this discussion.
See <http://unicode.org/reports/tr9/>.
“This annex describes specifications for the *positioning* of characters
flowing from right to left” (emphasis mine)

Looks like something for implementors of rendering engines, which
python-dev is not AFAICT.
...
Same users may want to be able to cut and paste their decimals as
well.  More importantly, however, legacy formats may not have support
for mixed-direction text and may require that "John is 41" be stored
as "41 si nhoJ" and Unicode converter would turn it into "[RTL]John is
14"  that will still display as  "41 si nhoJ", but int(s[-2:]) will
return 14, not 41.
The legacy format argument looks like a red herring to me. When
converting from a format to another it is the programmer's job to
his/her job right.
...
...
...
If we've got it right for Arabic, it is by
chance and not by design.  This still leaves us with 41 other types of
digits for at least 30 different languages.
So why do you trust the Unicode standard on other things and not on this
one?
What other things?
Everything which the Unicode database stores and that we already rely
on.
...
As far as I understand the only str method that was
designed to comply with Unicode recomendations was str.isidentifier().
I don't think so.  str.split() and str.splitlines() are also defined in
conformance to the SPEC, AFAIK.  They certainly try to.
And, outside of str itself, the re module tries to follow Unicode
categories as well (for example, "\d" should match non-ASCII digits).

Regards

Antoine.

Re: [Python-Dev] Python and the Unicode Character Database

Antoine Pitrou