transliteration in Python

Martin von Loewis loewis at informatik.hu-berlin.de
Fri Jan 4 12:17:08 EST 2002


"Jason Orendorff" <jason at jorendorff.com> writes:

>   s1 = "shchi"            # start with some ascii bytes
>   u = s1.decode('ascii')  # decode them into a unicode string
>   s2 = u.encode('utf16')  # encode it as UTF-16 bytes
>   outfile.write(s2)       # write them to a binary file, for example
> 
> [In this case, I know that s1 is ascii-encoded, because I typed in the
> letters "shchi" and I know that those are all ascii characters, and Python
> and my computer both handle ASCII just fine by default.  

This is not what Giorgi wanted. He was asking for translateration,
i.e.  where shch is a latin read-alike of some cyrillic
letter. Transliteration is a common technique to display non-latin
languages with latin letters (often even without using accent marks).

> Python only supports a few encodings out of the box.  KOIR-8, the one
> you mentioned, apparently isn't one of them.

KOI8-R is supported.

Regards,
Martin



More information about the Python-list mailing list