[Python-Dev] Disabling Unicode readbuffer interface

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Thu, 21 Sep 2000 18:19:53 +0200


> Martin, haven't you read my last post to Guido ? 

I've read

http://www.python.org/pipermail/python-dev/2000-September/016162.html

where you express a preference of disabling the getreadbuf slot, in
addition to special-casing Unicode objects in s#. I've just tested the
effects of your solution 1 on the test suite. Or are you referring to
a different message?

> Completely disabling getreadbuf is not a solution worth considering --
> it breaks far too much code which the test suite doesn't even test,
> e.g. MarkH's win32 stuff produces tons of Unicode object which
> then can get passed to potentially all of the stdlib. The test suite
> doesn't check these cases.

Do you have any specific examples of what else would break? Looking at
all occurences of 's#' in the standard library, I can't find a single
case where the current behaviour would be right - in all cases raising
an exception would be better. Again, any counter-examples?

>     Special case Unicode in getargs.c's code for "s#" only and leave
>     getreadbuf enabled. "s#" could then return the default encoded
>     value for the Unicode object while SRE et al. could still use 
>     PyObject_AsReadBuffer() to get at the raw data.

I think your option 2 is acceptable, although I feel the option 1
would expose more potential problems. What if an application
unknowingly passes a unicode object to md5.update? In testing, it may
always succeed as ASCII-only data is used, and it will suddenly start
breaking when non-ASCII strings are entered by some user. 

Using the internal rep would also be wrong in this case - the md5 hash
would depend on the byte order, which is probably not desired (*).

In any case, your option 2 would be a big improvement over the current
state, so I'll just shut up.

Regards,
Martin

(*) BTW, is there a meaningful way to define md5 for a Unicode string?