[I18n-sig] Re: [Python-Dev] Unicode debate
Guido van Rossum
guido@python.org
Tue, 02 May 2000 08:26:50 -0400
[MAL]
> Let's not do the same mistake again: Unicode objects should *not*
> be used to hold binary data. Please use buffers instead.
Easier said than done -- Python doesn't really have a buffer data
type. Or do you mean the array module? It's not trivial to read a
file into an array (although it's possible, there are even two ways).
Fact is, most of Python's standard library and built-in objects use
(8-bit) strings as buffers.
I agree there's no reason to extend this to Unicode strings.
> BTW, I think that this behaviour should be changed:
>
> >>> buffer('binary') + 'data'
> 'binarydata'
>
> while:
>
> >>> 'data' + buffer('binary')
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> TypeError: illegal argument type for built-in operation
>
> IMHO, buffer objects should never coerce to strings, but instead
> return a buffer object holding the combined contents. The
> same applies to slicing buffer objects:
>
> >>> buffer('binary')[2:5]
> 'nar'
>
> should prefereably be buffer('nar').
Note that a buffer object doesn't hold data! It's only a pointer to
data. I can't off-hand explain the asymmetry though.
> --
>
> Hmm, perhaps we need something like a data string object
> to get this 100% right ?!
>
> >>> d = data("...data...")
> or
> >>> d = d"...data..."
> >>> print type(d)
> <type 'data'>
>
> >>> 'string' + d
> d"string...data..."
> >>> u'string' + d
> d"s\000t\000r\000i\000n\000g\000...data..."
>
> >>> d[:5]
> d"...da"
>
> etc.
>
> Ideally, string and Unicode objects would then be subclasses
> of this type in Py3K.
Not clear. I'd rather do the equivalent of byte arrays in Java, for
which no "string literal" notations exist.
--Guido van Rossum (home page: http://www.python.org/~guido/)