i18n pot file

anabell anabell at sh163a.sta.net.cn
Sun Nov 2 01:37:03 EST 2003


I was unable to use charset utf-8 because I get this error message when I
try to run my localized application:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid
data

Finding an alternative, I tried writing a gb2312 codec, which is not
availabe initially with Python's package.  I downloaded a gb2312 character
map and put it in the gb2312.py codec file.  It worked well (Simplified
Chinese characters are displayed).

I wonder though how I can make utf-8 work if it does support any language.
I looked into my python23/lib/encodings/ and found there exist utf-8.  I
edited my .po file to charset utf_8, and generated its .mo file.  But when I
ran my localized application, python's gettext module can recognize the
charset 'utf8', but problem occurs when it starts decoding the .mo file.

I opened the utf_8.py codec file, and found no character map.  I wonder if
it's using a wrong map?


> It depends. The best CHARSET to use is UTF-8
> but of course you have to enter UTF-8 data into
> the po file. You can write all languages in UTF-8.
> There are charsets specific to language (groups)
> which you may prefer. For e.g.
>   Big5        for traditional chinese
>   gb2312      for simplified chinese
>   ISO-8859-15 for western European languages
>
> The ENCODING should be 8bit in all cases
>
> The handiest thing to do is to look at examples:
> http://www2.iro.umontreal.ca/~gnutra/registry.cgi?team=zh_CN
>
> Maybe you could even practice on my application :-)
> http://www2.iro.umontreal.ca/~gnutra/registry.cgi?domain=fslint
>
> Pádraig.


> > anabell wrote:
> > Hi, I'm trying to localize to Chinese language.  In the pot
> file header,
> > there appears:
> >
> >     "POT-Creation-Date: Thu Oct 16 17:07:14 2003\n"
> >     "PO-Revision-Date: 2003-10-16 HO:MI+ZONE\n"
> >     "Last-Translator: Anabell chan <achan at mail.design.com
> > <mailto:achan at mail.design.com>>\n"
> >     "Language-Team: LANGUAGE <LL at li.org <mailto:LL at li.org>>\n"
> >     "MIME-Version: 1.0\n"
> >     "Content-Type: text/plain; charset=CHARSET\n"
> >     "Content-Transfer-Encoding: ENCODING\n"
> >     "Generated-By: pygettext.py 1.5\n"
> > What should i fill in the 'CHARSET' and 'ENCODING' ?
>
>






More information about the Python-list mailing list