[Python-Dev] Disabling Unicode readbuffer interface

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 20 Sep 2000 21:22:24 +0200


I just tried to disable the getreadbufferproc on Unicode objects. Most
of the test suite continues to work. 

test_unicode fails, which is caused by "s#" not working anymore when
in readbuffer_encode when testing the unicode_internal encoding. That
could be fixed (*).

More concerning, sre fails when matching a unicode string. sre uses
the getreadbufferproc to get to the internal representation. If it has
sizeof(Py_UNICODE) times as many bytes as it is long, we got a unicode
buffer (?!?).

I'm not sure what the right solution would be in this case: I *think*
sre should have more specific knowledge of Unicode objects, so it
should support objects with a buffer interface representing a 1-byte
character string, or Unicode objects. Actually, is there anything
wrong with sre operating on string and unicode objects only? It
requires that the buffer has a single segment, anyway...

Regards,
Martin

(*) The 'internal encoding' function should directly get to the
representation of the unicode object, and readbuffer_encode could
become Python:

def readbuffer_encode(o,errors="strict"):
  b = buffer(o)
  return str(b),len(b)

or be removed altogether, as it would (rightfully) stop working on
unicode objects.