Python3.3 str() bug?

Helmut Jarausch jarausch at igpm.rwth-aachen.de
Fri Nov 9 14:13:58 CET 2012


On Fri, 09 Nov 2012 23:22:04 +1100, Chris Angelico wrote:

> On Fri, Nov 9, 2012 at 10:08 PM, Helmut Jarausch
> <jarausch at igpm.rwth-aachen.de> wrote:
>> For me it's not funny, at all.
> 
> His description "funny" was in reference to the fact that you
> described this as a bug. This is a heavily-used mature language; bugs
> as fundamental as you imply are unlikely to exist (consequences of
> design decisions there will be, but not outright bugs, usually);
> extraordinary claims require extraordinary evidence.

Just for the record.
I first discovered a real bug with Python3 when using os.walk on a file system
containing non-ascii characters in file names.

I encountered a very strange behavior (I still would call it a bug) when trying
to put non-ascii characters in email headers.
This has only been solved satisfactorily in Python3.3. 
> 
>> Whenever Python3 encounters a bytestring it needs an encoding to convert it to
>> a string. If I feed a list of bytestrings or a list of list of bytestrings to
>> 'str' , etc, it should use the encoding for each bytestring component of the
>> given data structure.
>>
>> How can I convert a data strucure of arbitrarily complex nature, which contains
>> bytestrings somewhere, to a string?
> 
> Okay, now we're getting somewhere.
> 
> What you really should be doing is not transforming the whole
> structure, but explicitly transforming each part inside it. I
> recommend you stop fighting the language and start thinking about your
> data as either *bytes* or *characters* and using the appropriate data
> types (bytes or str) everywhere. You'll then find that it makes
> perfect sense to explicitly translate (en/decode) from one to another,
> but it doesn't make sense to encode a list in UTF-8 or decode a
> dictionary from Latin-1.
> 
>> This problem has arisen while converting a working Python2 script to Python3.3.
>> Since Python2 doesn't have bytestrings it just works.
> 
> Actually it does; it just calls them "str". And there's a Unicode
> string type, called "unicode", which is (more or less) the thing that
> Python 3 calls "str".
> 
> You may be able to do some kind of recursive cast that, in one sweep
> of your data structure, encodes all str objects into bytes using a
> given encoding (or the reverse thereof). But I don't think this is the
> best way to do things.

Thanks, but in my case the (complex) object is returned via ctypes from the 
aspell library.
I still think that a standard function in Python3 which is able to 'stringify'
objects should take an encoding parameter.

Thanks,
Helmut.



More information about the Python-list mailing list