a question about Chinese characters in a Python Program

Benjamin Kaplan benjamin.kaplan at case.edu
Mon Oct 20 18:54:38 CEST 2008


On Mon, Oct 20, 2008 at 12:44 PM, est <electronixtar at gmail.com> wrote:

> On Oct 20, 11:46 pm, Steven D'Aprano <st... at REMOVE-THIS-
> cybersource.com.au> wrote:
> > On Mon, 20 Oct 2008 06:30:09 -0700, est wrote:
> > > Like I said, str() should NOT throw an exception BY DESIGN, it's a
> basic
> > > language standard.
> >
> > int() is also a basic language standard, but it is perfectly acceptable
> > for int() to raise an exception if you ask it to convert something into
> > an integer that can't be converted:
> >
> > int("cat")
> >
> > What else would you expect int() to do but raise an exception?
> >
> > If you ask str() to convert something into a string which can't be
> > converted, then what else should it do other than raise an exception?
> > Whatever answer you give, somebody else will argue it should do another
> > thing. Maybe I want failed characters replaced with '?'. Maybe Fred wants
> > failed characters deleted altogether. Susan wants UTF-16. George wants
> > Latin-1.
> >
> > The simple fact is that there is no 1:1 mapping from all 65,000+ Unicode
> > characters to the 256 bytes used by byte strings, so there *must* be an
> > encoding, otherwise you don't know which characters map to which bytes.
> >
> > ASCII has the advantage of being the lowest common denominator. Perhaps
> > it doesn't make too many people very happy, but it makes everyone equally
> > unhappy.
> >
> > > str() is not only a convert to string function, but
> > > also a serialization in most cases.(e.g. socket) My simple suggestion
> > > is: If it's a unicode character, output as UTF-8;
> >
> > Why UTF-8? That will never do. I want it output as UCS-4.
> >
> > > other wise just ouput
> > > byte array, please do not encode it with really stupid range(128)
> ASCII.
> > > It's not guessing, it's totally wrong.
> >
> > If you start with a byte string, you can always get a byte string:
> >
> > >>> s = '\x96 \xa0 \xaa'  # not ASCII characters
> > >>> s
> > '\x96 \xa0 \xaa'
> > >>> str(s)
> >
> > '\x96 \xa0 \xaa'
> >
> > --
> > Steven
>
> In fact Python handles characters well than most other open-source
> programming languages. But still:
>
> 1. You can explain str() in 1000 ways, there are 1001 more confusing
> error on all kinds of python apps. (Not only some of the scripts I've
> written, but also famous enough apps like Boa Constructor
> http://i36.tinypic.com/1gqekh.jpg. This sucks hard, right?)
>
>
> 2. Anyone please kindly tell me how can I define a customized encoding
> (namely 'ansi') which handles range(256) so I can
> sys.setdefaultencoding('ansi') once and for all?
> --
> http://mail.python.org/mailman/listinfo/python-list
>

There is no such thing as the "ansi" encoding. The only encoding defined by
the American National Standards Institute is the 7-bit ASCII encoding that
Python uses by default. You are probably thinking of cp-1252, the Windows
Western European code page, which isn't actually an ANSI standard.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20081020/66eb53da/attachment.html>


More information about the Python-list mailing list