[MAL]
Let's not do the same mistake again: Unicode objects should *not* be used to hold binary data. Please use buffers instead.
Easier said than done -- Python doesn't really have a buffer data type. Or do you mean the array module? It's not trivial to read a file into an array (although it's possible, there are even two ways). Fact is, most of Python's standard library and built-in objects use (8-bit) strings as buffers. I agree there's no reason to extend this to Unicode strings.
BTW, I think that this behaviour should be changed:
buffer('binary') + 'data' 'binarydata'
while:
'data' + buffer('binary') Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: illegal argument type for built-in operation
IMHO, buffer objects should never coerce to strings, but instead return a buffer object holding the combined contents. The same applies to slicing buffer objects:
buffer('binary')[2:5] 'nar'
should prefereably be buffer('nar').
Note that a buffer object doesn't hold data! It's only a pointer to data. I can't off-hand explain the asymmetry though.
--
Hmm, perhaps we need something like a data string object to get this 100% right ?!
d = data("...data...") or d = d"...data..." print type(d) <type 'data'>
'string' + d d"string...data..." u'string' + d d"s\000t\000r\000i\000n\000g\000...data..."
d[:5] d"...da"
etc.
Ideally, string and Unicode objects would then be subclasses of this type in Py3K.
Not clear. I'd rather do the equivalent of byte arrays in Java, for which no "string literal" notations exist. --Guido van Rossum (home page: http://www.python.org/~guido/)