[Tutor] Absolute newbie - Transliteration

David Rogers davidrogers@telus.net
Wed May 21 02:54:02 2003


Hi

I'm an absolute newbie - this is my first attempt with Python or any 
"real" language, so my advance apologies for any stupid comments.  I 
joined the list just to ask this question, after doing a little 
searching in the list archives and the documentation and not being able 
to find out what I want to know.

I'm trying make scripts to transliterate a file from (Unicode) Cyrillic 
characters to each of
- Roman script, and
- International Phonetic Alphabet (more Unicode).

(Whether I end up with separate scripts, one for each transliteration, 
or one script for all with a bigger dictionary/list/table, is not 
important to me.)

The transliteration will not always be one-to-one in terms of the 
number of characters, for example the "ch" sound is one letter in 
Russian but corresponds to two letters in English.

I have found the following in the Python web documentation...

> translate(table[, deletechars])
>
>
> Return a copy of the string where all characters occurring in the 
> optional argument deletechars are removed, and the remaining 
> characters have been mapped through the given translation table, which 
> must be a string of length 256.


...but I don't understand what format my table needs to be in, or even 
if this accommodates Unicode, or the problem of one character sometimes 
translating to two.  If I'm completely on the wrong track here, 
somebody laugh now before it's too late.   :-)


What I don't want is a pointer to a non-modifiable Cyrillic-to-Roman 
transliteration application, because I want to re-use what I do here 
when I make other transliteration tables to speed up IPA transcription 
from other languages too.  I love IPA.    :-)

On the other hand, if somebody has already done something like what I 
want, in a script I can modify for other uses, then I'm all ears.  
(Some of me is ears all the time.)  I'm happy to make the lists, 
dictionary entries, or whatever format they need to be in - I just want 
to know how to get Python to read this stuff and then give me back the 
right thing.

I'm using Mac OS X, if it makes any difference.


Many thanks
David Rogers