harmful str(bytes)
Antoine Pitrou
solipsis at pitrou.net
Mon Oct 11 18:20:29 EDT 2010
On Mon, 11 Oct 2010 21:50:32 +0200
Hallvard B Furuseth <h.b.furuseth at usit.uio.no> wrote:
>
> I'd just posted an example in article <hbf.20101008cg74 at bombur.uio.no>:
>
> urllib.parse.urlunparse(('', '', '/foo', b'bar', '', '')) returns
> "/foo;b'bar'" instead of raising an exception or returning 2.6's correct
> "/foo;bar".
Oh, this looks like a bug in urlparse. Could you report it at
http://bugs.python.org ? Thanks.
> > But if you already have str objects, you don't have to
> > call str() or format them using "%s", so implicit __str__ calls are
> > avoided.
>
> Except it's quite normal to output strings with %s.
"%s" will take the string representation of anything you give it:
bytes, but also, files, sockets, dicts, tuples, etc. So, if you're
using "%s" somewhere, it's your job to ensure that you give it the
desired type.
> Maybe also to depend on the fact that str.__str__() is a
> noop, so one can call str() just in case some variable needs to be
> unpacked to a plain string.
Well, if you don't know what types you are currently handling and
convert them to strings "just in case", chances are you're doing
something wrong.
> > 2) some unicode objects didn't have a succesful str()
> >
> > Python 3 fixes both these issues. Fixing 1) means there's no automatic
> > coercion when trying to mix bytes and unicode.
>
> Fine, so programs will have to do it themselves...
That's exactly the point, yes :) It's not Python's job to guess how some
bytes you got e.g. on a socket should be decoded.
> > (...)
> > And fixing 2) means bytes object get a meaningful str() in all
> > circumstances, which is much better for debug output.
>
> Except str() on such data has a different meaning than it did before,
Yes, it's Python 3 and it's incompatible with Python 2... !
Regards
Antoine.
More information about the Python-list
mailing list