Python3.3 str() bug?
oscar.j.benjamin at gmail.com
Sat Nov 10 17:45:26 CET 2012
On 9 November 2012 11:08, Helmut Jarausch <jarausch at igpm.rwth-aachen.de> wrote:
> On Fri, 09 Nov 2012 10:37:11 +0100, Stefan Behnel wrote:
>> Helmut Jarausch, 09.11.2012 10:18:
>>> probably I'm missing something.
>>> Using str(Arg) works just fine if Arg is a list.
>>> gives the error
>>> TypeError: coercing to str: need bytes, bytearray or buffer-like object,
>>> list found
>>> If this isn't a bug how can I use str(Arg,encoding='latin-1') in general.
>>> Do I need to flatten any data structure which is normally excepted by str() ?
>> Funny idea to call this a bug in Python. What your code is asking for is to
>> decode the object you pass in using the "latin-1" encoding. Since a list is
>> not something that is "encoded", let alone in latin-1, you get an error,
>> and actually a rather clear one.
>> Note that this is not specific to Python3.3 or even 3.x. It's the same
>> thing in Py2 when you call the equivalent unicode() function.
> For me it's not funny, at all.
I think the problem is that the str constructor does two fundamentally
different things depending on whether you have supplied the encoding
argument. From help(str) in Python 3.2:
| str(object[, encoding[, errors]]) -> str
| Create a new string object from the given object. If encoding or
| errors is specified, then the object must expose a data buffer
| that will be decoded using the given encoding and error handler.
| Otherwise, returns the result of object.__str__() (if defined)
| or repr(object).
| encoding defaults to sys.getdefaultencoding().
| errors defaults to 'strict'.
So str(obj) returns obj.__str__() but str(obj, encoding='xxx') decodes
a byte string (or a similar object) using a given encoding. In most
cases obj will be a byte string and it will be equivalent to using
I think the help text is a little confusing. It says that encoding
defaults to sys.getdefaultencoding() but doesn't clarify but this only
applies if errors is given as a keyword argument since otherwise no
decoding is performed. Perhaps the help text would be clearer if it
listed the two operations as two separate cases e.g.:
Returns a string object from object.__str__() if it is defined or
otherwise object.__repr__(). Raises TypeError if the returned result
is not a string object.
str(bytes, [encoding[, errors]])
If either encoding or errors is supplied, creates a new string
object by decoding bytes with the specified encoding. The bytes
argument can be any object that supports the buffer interface.
encoding defaults to sys.getdefaultencoding() and errors defaults to
> Whenever Python3 encounters a bytestring it needs an encoding to convert it to
> a string.
Well actually Python 3.3 will happily convert it to a string using
bytes.__repr__ if you don't supply the encoding argument:
>>> str(b'this is a byte string')
"b'this is a byte string'"
> If I feed a list of bytestrings or a list of list of bytestrings to
> 'str' , etc, it should use the encoding for each bytestring component of the
> given data structure.
You can always do:
[str(obj, encoding='xxx') for obj in list_of_byte_strings]
> How can I convert a data strucure of arbitrarily complex nature, which contains
> bytestrings somewhere, to a string?
Using str(obj) or repr(obj). Of course this relies on the author of
type(obj) defining the appropriate methods and writing the code that
actually converts the object into a string.
> This problem has arisen while converting a working Python2 script to Python3.3.
> Since Python2 doesn't have bytestrings it just works.
In Python 2 ordinary strings are byte strings.
> Tell me how to convert str(obj) from Python2 to Python3 if obj is an
> arbitrarily complex data structure containing bytestrings somewhere
> which have to be converted to strings with a given encoding?
The str function when used to convert a non-string object into a
string knows nothing about the object you provide except whether it
has __str__ or __repr__ methods. The only processing that is done is
to check that the returned result was actually a string:
>>> class A:
... def __str__(self):
... return 
>>> a = A()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type list)
Perhaps it would help if you would explain why you want the string
object. I would only use str(complex_object) as something to print for
debugging so I would actually want it to show me which strings were
byte strings by marking them with a 'b' prefix and I would also want
it to show non-ascii characters with a \x hex code as it already does:
>>> a = [1, 2, b'caf\xe9']
"[1, 2, b'caf\\xe9']"
If I wanted to convert the object to a string in order to e.g. save it
to a file or database then I would write a function to create the
string that I wanted. I would only use str() to convert elementary
types like int and float into strings.
More information about the Python-list