Re: [I18n-sig] Re: [Python-Dev] Unicode debate

2 May 2000

      Moshe Zadka wrote:
...
I'd much prefer Python to reflect a
fundamental truth about Unicode, which at least makes sure binary-goop can
pass through Unicode and remain unharmed, then to reflect a nasty problem
with UTF-8 (not everything is legal).
Let's not do the same mistake again: Unicode objects should *not*
be used to hold binary data. Please use buffers instead.

BTW, I think that this behaviour should be changed:
...
...
...
buffer('binary') + 'data'
'binarydata'
while:
...
...
...
'data' + buffer('binary')         
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: illegal argument type for built-in operation
IMHO, buffer objects should never coerce to strings, but instead
return a buffer object holding the combined contents. The
same applies to slicing buffer objects:
...
...
...
buffer('binary')[2:5]
'nar'
should prefereably be buffer('nar').

--

Hmm, perhaps we need something like a data string object
to get this 100% right ?!
...
...
...
d = data("...data...")
or
d = d"...data..."
print type(d)

...
...
...
'string' + d
d"string...data..."
u'string' + d
d"s\000t\000r\000i\000n\000g\000...data..."
...
...
...
d[:5]
d"...da"
etc.

Ideally, string and Unicode objects would then be subclasses
of this type in Py3K.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/

Re: [I18n-sig] Re: [Python-Dev] Unicode debate

M.-A. Lemburg