How to convert between Japanese coding systems?

Justin Ezequiel justin.mailinglists at gmail.com
Thu Feb 19 04:02:01 EST 2009


On Feb 19, 2:28 pm, Dietrich Bollmann <dir... at web.de> wrote:
> Are there any functions in python to convert between different Japanese
> coding systems?
>
> I would like to convert between (at least) ISO-2022-JP, UTF-8, EUC-JP
> and SJIS.  I also need some function to encode / decode base64 encoded
> strings.
>
> Example:
>
> email = '''[...]
> Subject:
> =?UTF-8?Q?romaji=E3=81=B2=E3=82=89=E3=81=8C=E3=81=AA=E3=82=AB=E3=82=BF?=
> =?UTF-8?Q?=E3=82=AB=E3=83=8A=E6=BC=A2=E5=AD=97?=
> [...]
> Content-Type: text/plain; charset=EUC-JP
> [...]
> Content-Transfer-Encoding: base64
> [...]
>
> cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K
>
> '''
>
>   from = contentType
>   to = 'utf-8'
>   contentUtf8 = convertCodingSystem(decodeBase64(content), from, to)
>
> The only problem is that I could not find any standard functionality to
> convert between different Japanese coding systems.
>
> Thanks,
>
> Dietrich Bollmann

import base64

ENCODINGS = ['ISO-2022-JP', 'UTF-8', 'EUC-JP', 'SJIS']

def decodeBase64(content):
    return base64.decodestring(content)

def convertCodingSystem(s, _from, _to):
    unicode = s.decode(_from)
    return unicode.encode(_to)

if __name__ == '__main__':
    content = 'cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K'
    _from = 'EUC-JP'
    for _to in ENCODINGS:
        x = convertCodingSystem(decodeBase64(content), _from, _to)
        print _to, repr(x)




More information about the Python-list mailing list