[Python-Dev] methods on the bytes object

Sun Apr 30 19:49:58 CEST 2006

Guido van Rossum wrote:
> Well, yes, in most cases, but while attempting to write an I/O library
> I already had the urge to collect "chunks" I've read in a list and
> join them later, instead of concatenating repeatedly. I guess I should
> suppress this urge, and plan to optimize extending a bytes arrays
> instead, along the lines of how we optimize lists.

That is certainly a frequent, although degenerated, use case of
string.join (i.e. with an empty separator). Instead of introducing
bytes.join, I think we should reconsider this:

py> sum(["ab","cd"],"")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]

Why is it that sum requires numbers?

> Still, I expect that having a bunch of string-ish methods on bytes
> arrays would be convenient for certain types of data handling. Of
> course, only those methods that don't care about character types would
> be added, but that's a long list: startswith, endswith, index, rindex,
> find, rfind, split, rsplit, join, count, replace, translate.

The problem I see with these is that people will use them for text-ish
operations, e.g.

  data.startswith("text".encode("ascii"))

While that would be "correct", I see two problems:

a) people will complain that they have to use an explicit .encode
   call, and demand that this should "just work", and
b) people will refuse to rewrite their algorithms for character
   strings (which they likely should in most applications of,
   say, .startswith), and then complain that the bytes type
   is soooo limited, and they really want a full byte string
   type back.

Regards,
Martin