harmful str(bytes)

Mon Oct 11 18:20:29 EDT 2010

On Mon, 11 Oct 2010 21:50:32 +0200
Hallvard B Furuseth <h.b.furuseth at usit.uio.no> wrote:
> 
> I'd just posted an example in article <hbf.20101008cg74 at bombur.uio.no>:
> 
> urllib.parse.urlunparse(('', '', '/foo', b'bar', '', '')) returns
> "/foo;b'bar'" instead of raising an exception or returning 2.6's correct
> "/foo;bar".

Oh, this looks like a bug in urlparse. Could you report it at
http://bugs.python.org ? Thanks.

> > But if you already have str objects, you don't have to
> > call str() or format them using "%s", so implicit __str__ calls are
> > avoided.
> 
> Except it's quite normal to output strings with %s.

"%s" will take the string representation of anything you give it:
bytes, but also, files, sockets, dicts, tuples, etc. So, if you're
using "%s" somewhere, it's your job to ensure that you give it the
desired type.

> Maybe also to depend on the fact that str.__str__() is a
> noop, so one can call str() just in case some variable needs to be
> unpacked to a plain string.

Well, if you don't know what types you are currently handling and
convert them to strings "just in case", chances are you're doing
something wrong.

> > 2) some unicode objects didn't have a succesful str()
> >
> > Python 3 fixes both these issues. Fixing 1) means there's no automatic
> > coercion when trying to mix bytes and unicode.
> 
> Fine, so programs will have to do it themselves...

That's exactly the point, yes :) It's not Python's job to guess how some
bytes you got e.g. on a socket should be decoded.

> > (...)
> > And fixing 2) means bytes object get a meaningful str() in all
> > circumstances, which is much better for debug output.
> 
> Except str() on such data has a different meaning than it did before,

Yes, it's Python 3 and it's incompatible with Python 2... !

Regards

Antoine.