[Python-Dev] strop vs. string

Paul Barrett Barrett@stsci.edu
Mon, 04 Jun 2001 09:22:14 -0400

"M.-A. Lemburg" wrote:
> Tim Peters wrote:
> >
> > [Tim]
> > > About combining strop and buffers and strings, don't forget
> > > unicodeobject.c:  that's got oodles of basically duplicate code too.
> > > /F suggested dealing with the minor differences via maintaining one
> > > code file that gets compiled multiple times w/ appropriate #defines.
> >
> > [MAL]
> > > Hmm, that only saves us a few kB in source, but certainly not
> > > in the object files.
> >
> > That's not the point.  Manually duplicated code blocks always get out of
> > synch, as people fix bugs in, or enhance, one of them but don't even know
> > about the others.  /F brought this up after I pissed away a few hours trying
> > to repair one of these in all places, and he noted that strop.replace() and
> > string.replace() are woefully inefficient anyway.
> Ok, so what we'd need is a bunch of generic low-level string
> operations: one set for 8-bit and one for 16-bit code.
> Looking at unicodeobject.c it seems that the section "Helpers" would
> be a good start, plus perhaps a few bits from the method implementations
> refactored to form a low-level string template library.
> Perhaps we should move this code into
> a file stringhelpers.h which then gets included by stringobject.c
> and unicodeobject.c with appropriate #defines set up for
> 8-bit strings and for Unicode.
> > > The better idea would be making the types subclass from a generic
> > > abstract string object -- I just don't know how this will be
> > > possible with Guido's type patches. We'll just have to wait,
> > > I guess.

>From the discussion so far, it appears that the buffer object is
intended solely to support string-like objects.  I've seen no mention
of their use for binary data objects, such as multidimensional arrays
and matrices.  Will the buffer object also support these objects?  If
no, then I suggest it be renamed to one that is less generic and more

On the otherhand, if yes, then I think the buffer C/API needs to be
reimplemented, because the current design/implementation falls far
short of what I would expect for a buffer object.  First, it is overly
complex: the support for multiple buffers does not appear necessary. 
Second, the dangling pointer issue has not been resolved.  I suggest
the addition of lock flag which indicates that the data is currently
inaccessible, ie. that data and/or data pointer is in the process of
being modified.

I would suggest the following structure to be much more useful for
char and binary data:

typedef struct {
    char* rf_pointer;
    int   rf_length;
    int   rf_access;  /* read, write, etc.  */
    int   rf_lock;    /* data is in use  */
    int   rf_flags;   /* type of data; char, binary, unicode, etc.  */
} PyBufferProcs;

But I'm guessing my proposal is way off base.

If I find some time, I'll prepare a PEP to air these issues, since
they are very important to those of us working on and with
multidimensional arrays. We find the current buffer API lacking.

Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218