[Python-Dev] strop vs. string

Mon, 04 Jun 2001 09:22:14 -0400

"M.-A. Lemburg" wrote:
> 
> Tim Peters wrote:
> >
> > [Tim]
> > > About combining strop and buffers and strings, don't forget
> > > unicodeobject.c:  that's got oodles of basically duplicate code too.
> > > /F suggested dealing with the minor differences via maintaining one
> > > code file that gets compiled multiple times w/ appropriate #defines.
> >
> > [MAL]
> > > Hmm, that only saves us a few kB in source, but certainly not
> > > in the object files.
> >
> > That's not the point.  Manually duplicated code blocks always get out of
> > synch, as people fix bugs in, or enhance, one of them but don't even know
> > about the others.  /F brought this up after I pissed away a few hours trying
> > to repair one of these in all places, and he noted that strop.replace() and
> > string.replace() are woefully inefficient anyway.
> 
> Ok, so what we'd need is a bunch of generic low-level string
> operations: one set for 8-bit and one for 16-bit code.
> 
> Looking at unicodeobject.c it seems that the section "Helpers" would
> be a good start, plus perhaps a few bits from the method implementations
> refactored to form a low-level string template library.
> 
> Perhaps we should move this code into
> a file stringhelpers.h which then gets included by stringobject.c
> and unicodeobject.c with appropriate #defines set up for
> 8-bit strings and for Unicode.
> 
> > > The better idea would be making the types subclass from a generic
> > > abstract string object -- I just don't know how this will be
> > > possible with Guido's type patches. We'll just have to wait,
> > > I guess.

>From the discussion so far, it appears that the buffer object is
intended solely to support string-like objects.  I've seen no mention
of their use for binary data objects, such as multidimensional arrays
and matrices.  Will the buffer object also support these objects?  If
no, then I suggest it be renamed to one that is less generic and more
descriptive.

On the otherhand, if yes, then I think the buffer C/API needs to be
reimplemented, because the current design/implementation falls far
short of what I would expect for a buffer object.  First, it is overly
complex: the support for multiple buffers does not appear necessary. 
Second, the dangling pointer issue has not been resolved.  I suggest
the addition of lock flag which indicates that the data is currently
inaccessible, ie. that data and/or data pointer is in the process of
being modified.

I would suggest the following structure to be much more useful for
char and binary data:

typedef struct {
    char* rf_pointer;
    int   rf_length;
    int   rf_access;  /* read, write, etc.  */
    int   rf_lock;    /* data is in use  */
    int   rf_flags;   /* type of data; char, binary, unicode, etc.  */
} PyBufferProcs;

But I'm guessing my proposal is way off base.

If I find some time, I'll prepare a PEP to air these issues, since
they are very important to those of us working on and with
multidimensional arrays. We find the current buffer API lacking.

-- 
Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218