[Python-Dev] Disabling Unicode readbuffer interface
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Thu, 21 Sep 2000 18:19:53 +0200
> Martin, haven't you read my last post to Guido ?
I've read
http://www.python.org/pipermail/python-dev/2000-September/016162.html
where you express a preference of disabling the getreadbuf slot, in
addition to special-casing Unicode objects in s#. I've just tested the
effects of your solution 1 on the test suite. Or are you referring to
a different message?
> Completely disabling getreadbuf is not a solution worth considering --
> it breaks far too much code which the test suite doesn't even test,
> e.g. MarkH's win32 stuff produces tons of Unicode object which
> then can get passed to potentially all of the stdlib. The test suite
> doesn't check these cases.
Do you have any specific examples of what else would break? Looking at
all occurences of 's#' in the standard library, I can't find a single
case where the current behaviour would be right - in all cases raising
an exception would be better. Again, any counter-examples?
> Special case Unicode in getargs.c's code for "s#" only and leave
> getreadbuf enabled. "s#" could then return the default encoded
> value for the Unicode object while SRE et al. could still use
> PyObject_AsReadBuffer() to get at the raw data.
I think your option 2 is acceptable, although I feel the option 1
would expose more potential problems. What if an application
unknowingly passes a unicode object to md5.update? In testing, it may
always succeed as ASCII-only data is used, and it will suddenly start
breaking when non-ASCII strings are entered by some user.
Using the internal rep would also be wrong in this case - the md5 hash
would depend on the byte order, which is probably not desired (*).
In any case, your option 2 would be a big improvement over the current
state, so I'll just shut up.
Regards,
Martin
(*) BTW, is there a meaningful way to define md5 for a Unicode string?