Character set conversion between mac and pc

Martin von Loewis loewis at informatik.hu-berlin.de
Sun Oct 1 04:58:00 EDT 2000


Pieter Claerhout <Pieter_Claerhout at CreoScitex.com> writes:

> does anyone has a module which is able to convert text in a
> macintosh characterset to a windows characterset? If not, how would
> one accomplish this in Python? I think you will have to use the
> string.translate function, but I couldn't find out how this one
> works?

The critical piece of information here is the translation table: you
need to know which character in the mac character set corresponds to
which on in the windows character set. Such information is available
from many places, e.g. in /usr/share/i18/charmaps on a Linux system,
or in the encodings directory of Python 2.

With this information, you construct a string of 256 characters. The
index into the string is the ordinal of a Mac character, the character
at the index is the corresponding Windows character. To actually
construct a table, you'd need to specify which Mac charset (roman,
greek, cyrillic) and which Windows codepage.

With Python 1.6/2.0, you can convert the original string to Unicode,
then convert it back to the target code

mac="..."
universal = unicode(mac,"mac-roman")
windows = universal.encode("cp1252")

Regards,
Martin




More information about the Python-list mailing list