Python3.3 str() bug?

Oscar Benjamin oscar.j.benjamin at
Sat Nov 10 17:45:26 CET 2012

On 9 November 2012 11:08, Helmut Jarausch <jarausch at> wrote:
> On Fri, 09 Nov 2012 10:37:11 +0100, Stefan Behnel wrote:
>> Helmut Jarausch, 09.11.2012 10:18:
>>> probably I'm missing something.
>>> Using   str(Arg) works just fine if  Arg is a list.
>>> But
>>>   str([],encoding='latin-1')
>>> gives the error
>>> TypeError: coercing to str: need bytes, bytearray or buffer-like object,
>>>            list found
>>> If this isn't a bug how can I use str(Arg,encoding='latin-1') in general.
>>> Do I need to flatten any data structure which is normally excepted by str() ?
>> Funny idea to call this a bug in Python. What your code is asking for is to
>> decode the object you pass in using the "latin-1" encoding. Since a list is
>> not something that is "encoded", let alone in latin-1, you get an error,
>> and actually a rather clear one.
>> Note that this is not specific to Python3.3 or even 3.x. It's the same
>> thing in Py2 when you call the equivalent unicode() function.
> For me it's not funny, at all.

I think the problem is that the str constructor does two fundamentally
different things depending on whether you have supplied the encoding
argument. From help(str) in Python 3.2:

 |  str(object[, encoding[, errors]]) -> str
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.

So str(obj) returns obj.__str__() but str(obj, encoding='xxx') decodes
a byte string (or a similar object) using a given encoding. In most
cases obj will be a byte string and it will be equivalent to using

I think the help text is a little confusing. It says that encoding
defaults to sys.getdefaultencoding() but doesn't clarify but this only
applies if errors is given as a keyword argument since otherwise no
decoding is performed. Perhaps the help text would be clearer if it
listed the two operations as two separate cases e.g.:

  Returns a string object from object.__str__() if it is defined or
otherwise object.__repr__(). Raises TypeError if the returned result
is not a string object.

str(bytes, [encoding[, errors]])
  If either encoding or errors is supplied, creates a new string
object by decoding bytes with the specified encoding. The bytes
argument can be any object that supports the buffer interface.
encoding defaults to sys.getdefaultencoding() and errors defaults to

> Whenever Python3 encounters a bytestring it needs an encoding to convert it to
> a string.

Well actually Python 3.3 will happily convert it to a string using
bytes.__repr__ if you don't supply the encoding argument:

>>> str(b'this is a byte string')
"b'this is a byte string'"

> If I feed a list of bytestrings or a list of list of bytestrings to
> 'str' , etc, it should use the encoding for each bytestring component of the
> given data structure.

You can always do:

[str(obj, encoding='xxx') for obj in list_of_byte_strings]

> How can I convert a data strucure of arbitrarily complex nature, which contains
> bytestrings somewhere, to a string?

Using str(obj) or repr(obj). Of course this relies on the author of
type(obj) defining the appropriate methods and writing the code that
actually converts the object into a string.

> This problem has arisen while converting a working Python2 script to Python3.3.
> Since Python2 doesn't have bytestrings it just works.

In Python 2 ordinary strings are byte strings.

> Tell me how to convert  str(obj) from Python2 to Python3 if obj is an
> arbitrarily complex data structure containing bytestrings somewhere
> which have to be converted to strings with a given encoding?

The str function when used to convert a non-string object into a
string knows nothing about the object you provide except whether it
has __str__ or __repr__ methods. The only processing that is done is
to check that the returned result was actually a string:

>>> class A:
...   def __str__(self):
...     return []
>>> a = A()
>>> str(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type list)

Perhaps it would help if you would explain why you want the string
object. I would only use str(complex_object) as something to print for
debugging so I would actually want it to show me which strings were
byte strings by marking them with a 'b' prefix and I would also want
it to show non-ascii characters with a \x hex code as it already does:

>>> a = [1, 2, b'caf\xe9']
>>> str(a)
"[1, 2, b'caf\\xe9']"

If I wanted to convert the object to a string in order to e.g. save it
to a file or database then I would write a function to create the
string that I wanted. I would only use str() to convert elementary
types like int and float into strings.


More information about the Python-list mailing list