[Python-3000] What to do about "".join([b""])?

Christian Heimes lists at cheimes.de
Fri Nov 2 09:40:46 CET 2007


Guido van Rossum wrote:
> Currently (in 3.0), "".join(<seq>) automatically applies str() to the
> items of <seq>, *except* if the item is a bytes instance -- then it
> raises a TypeError. Is that proper behavior? The alternative is to
> uniformly apply str(), which for bytes returns a string of the form
> "b'...'" or "buffer(b'...')" (depending on whether the bytes are
> immutable or not). Given that we killed the exception for "" == b""
> earlier, I'm tempted to remove the exception. Any opinions to the
> contrary?

-1

In Python 2.x the implicit encoding of a string with
sys.getdefaultencoding() caused me more than one headache. If fear the
implicit conversion of a byte sequence to its representation may cause
similar problems. If we take one step down that road we can't go back again.

''.join() could grow an encoding argument but that's ugly, too.
''.join(s if isinstance(s) else str(s, 'utf-8') for s in seq) works for
me. :)

However I like b''.join, buffer().join and the other methods to accept
buffers and bytes. I don't see a reason why the methods shouldn't accept
them.

>>> b"".join((b'1', b'2'))
b'12'
>>> b"".join((buffer(b'1'), buffer(b'2')))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected string, buffer found
>>> buffer().join((buffer(b'1'), buffer(b'2')))
buffer(b'12')
>>> buffer().join((b'1', b'2'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only join an iterable of bytes (item 0 has type 'bytes')

Christian



More information about the Python-3000 mailing list