transliteration in Python
Frederick H. Bartlett
fbartlet at optonline.net
Fri Jan 4 15:14:04 CET 2002
But transliteration isn't what encodings do. You are asking that a
encoding manage to handle variable-length ascii strings and just know
what to do with them. Given your question, you must want "shchi" to be
replaced by the Cyrillic character Unicode knows as "shcha" (0429/0449);
meanwhile "sh" would be "sha" (0428/0448), "ch" would be "che"
(0427/0447), "k" would be "ka" (041a/043a), and "kh" would be "ha"
(0425/0445). That's a lot of intelligence to build into an encoding.
My experience with encoding issues comes from TeX, where I used to
design my own encodings in .tfm files so that I could type Classical
Greek, Russian, and Georgian in American ascii with reasonable clarity.
(Clear to me, anyway.) But in TeX one can use ligatures to accomplish
neat encoding effects; no other encoding works that way.
Giorgi Lekishvili wrote:
> I am sorry to say this, but I was pretty confused exploring the
> encodings folder in the Python21 distribution.
> Can someone smarter than me explain the common syntax of transliteration
> of one encoding to another?
> Suppose we have string sl="shchi" and want to recode it in "KOIR-8". How
> can this be achieved?
> Thank you.
More information about the Python-list