[Python-Dev] bytes / unicode

Nick Coghlan ncoghlan at gmail.com
Thu Jun 24 17:25:18 CEST 2010


On Fri, Jun 25, 2010 at 12:33 AM, Guido van Rossum <guido at python.org> wrote:
> Also, IMO a polymorphic function should *not* accept *mixed*
> bytes/text input -- join('x', b'y') should be rejected. But join('x',
> 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make sense to me.

A policy of allowing arguments to be either str or bytes, but not a
mixture, actually avoids one of the more painful aspects of the 2.x
"promote mixed operations to unicode" approach. Specifically, you
either had to scan all the arguments up front to check for unicode, or
else you had to stop what you were doing and start again with the
unicode version if you encountered unicode partway through. Neither
was particularly nice to implement.

As you noted elsewhere, literals and string methods are still likely
to be a major sticking point with that approach - common operations
like ''.join(seq) and b''.join(seq) aren't polymorphic, so functions
that use them won't be polymorphic either. (It's only the str->unicode
promotion behaviour in 2.x that works around this problem there).

Would it be heretical to suggest that sum() be allowed to work on
strings to at least eliminate ''.join() as something that breaks bytes
processing? It already works for bytes, although it then fails with a
confusing message for bytearray:

>>> sum(b"a b c".split(), b'')
b'abc'

>>> sum(bytearray(b"a b c").split(), bytearray(b''))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sum() can't sum bytes [use b''.join(seq) instead]

>>> sum("a b c".split(), '')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list