Greg Ewing email@example.com wrote:
Josiah Carlson wrote:
A bit of free thought brings me to the (half-baked) idea that if string methods accepted any object which conformed to the buffer interface; mmap, buffer, array, ... instances could gain all of the really convenient methods that make strings the objects to use in many cases.
Not a bad idea, but they couldn't literally be string methods. They'd have to be standalone functions like we used to have in the string module before it got mercilessly deprecated. :-)
Not sure what happens to this when the unicode/bytearray future arrives, though. Treating a buffer of bytes as a character string isn't going to be so straightforward then.
Here's my thought: One could modify string methods to check the type of the input (string, unicode, or other). That check turns on a flag for whether the method returns are string, unicode, or buffers. One uses PyObject_AsBuffer() methods to pull the char* and length for any input offering the buffer protocol.
Now here's the fun part: One makes the methods aware of the type of the self parameter. One sets the 'split' method for the buffer object to be 'string_split', etc.
Unicode does indeed get tricky, how does one handle buffers of unicode objects? Right now, you get the raw pointer and underlying length ( len (buffer(u'hello')) == 10 ). If there was a unicode buffer (perhaps ubuffer), that would work, but I'm not sure I really like it.
I notice much of the discussion on 'string views', which to me seems like another way of saying 'buffer', and if there is a 'string view', there would necessarily need to be a 'unicode view'.
As for the bytes type, from what I understand, they should directly support buffers without issue.