<div dir="ltr"><br><br><div class="gmail_quote">On Mon, Oct 20, 2008 at 12:44 PM, est <span dir="ltr"><<a href="mailto:electronixtar@gmail.com">electronixtar@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


On Oct 20, 11:46 pm, Steven D'Aprano <st...@REMOVE-THIS-<br>


<a href="http://cybersource.com.au" target="_blank">cybersource.com.au</a>> wrote:<br>


> On Mon, 20 Oct 2008 06:30:09 -0700, est wrote:<br>


> > Like I said, str() should NOT throw an exception BY DESIGN, it's a basic<br>


> > language standard.<br>


><br>


> int() is also a basic language standard, but it is perfectly acceptable<br>


> for int() to raise an exception if you ask it to convert something into<br>


> an integer that can't be converted:<br>


><br>


> int("cat")<br>


><br>


> What else would you expect int() to do but raise an exception?<br>


><br>


> If you ask str() to convert something into a string which can't be<br>


> converted, then what else should it do other than raise an exception?<br>


> Whatever answer you give, somebody else will argue it should do another<br>


> thing. Maybe I want failed characters replaced with '?'. Maybe Fred wants<br>


> failed characters deleted altogether. Susan wants UTF-16. George wants<br>


> Latin-1.<br>


><br>


> The simple fact is that there is no 1:1 mapping from all 65,000+ Unicode<br>


> characters to the 256 bytes used by byte strings, so there *must* be an<br>


> encoding, otherwise you don't know which characters map to which bytes.<br>


><br>


> ASCII has the advantage of being the lowest common denominator. Perhaps<br>


> it doesn't make too many people very happy, but it makes everyone equally<br>


> unhappy.<br>


><br>


> > str() is not only a convert to string function, but<br>


> > also a serialization in most cases.(e.g. socket) My simple suggestion<br>


> > is: If it's a unicode character, output as UTF-8;<br>


><br>


> Why UTF-8? That will never do. I want it output as UCS-4.<br>


><br>


> > other wise just ouput<br>


> > byte array, please do not encode it with really stupid range(128) ASCII.<br>


> > It's not guessing, it's totally wrong.<br>


><br>


> If you start with a byte string, you can always get a byte string:<br>


><br>


> >>> s = '\x96 \xa0 \xaa'  # not ASCII characters<br>


> >>> s<br>


> '\x96 \xa0 \xaa'<br>


> >>> str(s)<br>


><br>


> '\x96 \xa0 \xaa'<br>


><br>


> --<br>


> Steven<br>


<br>


In fact Python handles characters well than most other open-source<br>


programming languages. But still:<br>


<br>


1. You can explain str() in 1000 ways, there are 1001 more confusing<br>


error on all kinds of python apps. (Not only some of the scripts I've<br>


written, but also famous enough apps like Boa Constructor<br>


<a href="http://i36.tinypic.com/1gqekh.jpg" target="_blank">http://i36.tinypic.com/1gqekh.jpg</a>. This sucks hard, right?)<br>


<br>


<br>


2. Anyone please kindly tell me how can I define a customized encoding<br>


(namely 'ansi') which handles range(256) so I can<br>


sys.setdefaultencoding('ansi') once and for all?<br>


<font color="#888888">--<br>


<a href="http://mail.python.org/mailman/listinfo/python-list" target="_blank">http://mail.python.org/mailman/listinfo/python-list</a><br>


</font></blockquote></div><br>There is no such thing as the "ansi" encoding. The only encoding defined by the American National Standards Institute is the 7-bit ASCII encoding that Python uses by default. You are probably thinking of cp-1252, the Windows Western European code page, which isn't actually an ANSI standard.<br>


</div>