print arabic characters

Ahmad eng_ak at link.net
Thu Dec 25 02:51:59 EST 2003


Hi all,

   I just wanted to tell everyone here, that none of the tips really
worked. The RTLstreamer class seemed so messed up, and printing
"\u200F" before my text didn't make any difference!! I can't beleive
that after all this time, unicode and bidi support still isn't working
nicely :(

OTOH, I tried pyGtk, the text is automatically RTL, (nice) but still
the first character in the scentence isn't showing.

Any other tricks?


"Martin v. Loewis" <martin at v.loewis.de> wrote in message news:<bs7kb5$9mv$01$1 at news.t-online.com>...
> Peter Otten wrote:
> > Disclaimer: As I know nothing about right-to-left printing languages, it's
> > likely that I have got it at least partially wrong.
> 
> Indeed. First of all, each Unicode character has a directionality,
> available as unicodedata.bidirectional; this is L, R, or AL for most
> characters; some characters have weak (EN, ES, ET, ...) or neutral
> (B, S, ...) directionality. You need to find runs of characters with
> the same directionality; extending the run into weak or neutral
> characters. Then you need to reverse only RTL runs, leaving the LTR
> runs intact.
> 
> Next, in the process of reversing, you may need to mirrot weak LTR
> characters, replacing them with their unicodedata.mirrored character.
> 
> Then, for AL runs, you need to replace European numerals with Arabic
> numerals (but keeping the LTR order).
> 
> Finally, and again for Arabic characters, you need to perform glyph
> shaping, replacing the first character of a word with the INITIAL
> FORM, the last character with the FINAL FORM, all other characters
> of a word with the MEDIAL FORM, and all remaining characters with
> the ISOLATED FORM. This, of course, assumes your font has glpyhs
> for these available.
> 
> This is specified in more detail in
> 
> http://www.unicode.org/reports/tr9/
> 
> > Can anybody point me to a way to iterate over characters with a varying
> > number of bytes?
> 
> There is no trivial algorithm. You best decode the string into Unicode,
> reverse, then encode again to the original encoding.
> 
> Regards,
> Martin




More information about the Python-list mailing list