[Python-Dev] Proof of the pudding: str.partition()

Thu Sep 1 08:03:22 CEST 2005

Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> 
> Josiah Carlson wrote:
> 
> > A bit of free thought brings me to the (half-baked) idea that if string
> > methods accepted any object which conformed to the buffer interface;
> > mmap, buffer, array, ... instances could gain all of the really
> > convenient methods that make strings the objects to use in many cases.
> 
> Not a bad idea, but they couldn't literally be string methods.
> They'd have to be standalone functions like we used to have in
> the string module before it got mercilessly deprecated. :-)
> 
> Not sure what happens to this when the unicode/bytearray future
> arrives, though. Treating a buffer of bytes as a character
> string isn't going to be so straightforward then.

Here's my thought:
One could modify string methods to check the type of the input (string,
unicode, or other).  That check turns on a flag for whether the method
returns are string, unicode, or buffers.  One uses PyObject_AsBuffer()
methods to pull the char* and length for any input offering the buffer
protocol.

Now here's the fun part: One makes the methods aware of the type of the
self parameter.  One sets the 'split' method for the buffer object to be
'string_split', etc.

Unicode does indeed get tricky, how does one handle buffers of unicode
objects?  Right now, you get the raw pointer and underlying length ( len
(buffer(u'hello')) == 10 ). If there was a unicode buffer (perhaps
ubuffer), that would work, but I'm not sure I really like it.

I notice much of the discussion on 'string views', which to me seems
like another way of saying 'buffer', and if there is a 'string view',
there would necessarily need to be a 'unicode view'.

As for the bytes type, from what I understand, they should directly
support buffers without issue.

 - Josiah