[Numpy-discussion] Trying out Numeric3

Fri Mar 25 03:35:35 EST 2005

--- Travis Oliphant <oliphant at ee.byu.edu> wrote:

> How would somebody change the status of that
> and re-open the PEP?

I believe all it would take is a note to the python-dev mailing list by the
new champion who was willing to implement and defend it.  The text is
public domain, so there's no copyright silliness if you need to make
changes.

I'm curious to see how this flies as this has always been one of their pet
peave topics.  Talking about buffer objects/protocols in general draws ire
from some and dead silence from the rest.  :-)

> 
> Also, numarray has a memory object implemented that is a good start on 
> the implementation.  So, this wouldn't be a huge job at this point.
> 

The memory object is a very good start.  I don't know if it tries to be
usable when the GIL is released, or if it handles the slice semantics the
same way.  I think doing pickling really well is a non-trivial issue -  at
least if this object is going into the core of Python.

Implementing the new pickling protocol is not terribly difficult, and any
object can do it, but that only solves the space half of the problem.  The
new pickling protocol allows one to serialize large data without making one
large copy of the binary data as a string, but one still has to make a lot
of little copies of the data a piece at time.  The multitude of little
parts cost time allocating and memcpy-ing just to be written to a file and
discarded.  It would be great if the Python core libraries (cPickle) could
be "taught" about the new type and serialize directly from the memory that
is already there without creating any new string copies.

> 
> > The "meta" attribute would be a small change.  It's possible
> > to do that with composition or inheritance instead, but that's
> > really a just matter of taste.
> >
> I don't think I fully understand what you mean by "composition"
> --- like a mixin class?  or how inheritance solves the problem
> on a C-API level?
> 
> I'm mainly thinking of Extension modules that want to use each others' 
> memory on a C-level.  That would be the main use of the meta information.
>

It would be a lot like putting a similar meta dictionary on the builtin
"list" object.  Many people wouldn't use it and would consider it a tiny
wart just taking up space, while others would use it pretty differently
from the way Numeric3 did and store completely different keys.  The result
would be that Numeric3 would have to check for the keys that it wanted in
the meta dictionary.

Since I think you're going to allow folks to pass in their own buffer
objects to some of the array constructors (mmap for instance), the
underlying Numeric3 code can't really assume that the "meta" attribute is
there on all buffer objects.

If you wanted to annotate all buffers that were passed inside of Numeric,
something like the following would work with "memory", and "mmap" alike:

    # Composition of a memory buffer and meta data
    class NumericStorage(object):
        def __init__(self, buf, **meta):
            self.buf = buf
            self.meta = meta.copy()

Of course at the C-Level it could just be a lightweight struct with two
PyObject pointers.

If you really wanted to add a meta attribute to the new generic memory
object, you could do:

    # Inheritance to add metadata to a memory buffer
    class NumericBytes(memory):
        def __init__(self, *args, **kwds):
            memory.__init__(self, *args, **kwds)
            self.meta = {}

It's a minor pain, but obviously inheritance like this can be done at the C
level too...

I don't know what particular meta data you plan to store with the buffer
itself, and I'm going to resist the urge to guess.  You probably have some
very good use cases.  What are you planning?  If you have a list of meta
keys that many if not all users would agree on, then it would be worth
considering just building them efficiently into the proposed type and not
wasting the overhead of a dictionary.  That would also standardize their
usage to some extent.

As I said before, this is all just a matter of taste.  I appologize for
using so much text to try and explain what I meant.  When all is said and
done, I think whether the C API code is required to check for keys in the
meta dictionary or attributes of the object itself, it's probably a pretty
similar task.  It would be PyDict_GetItem(...) versus
PyObject_GetAttr(...).

> 
> > When I wrote the PEP, I had high hopes of creating a 
> > Python only "ndarray" class out of bytes and the struct
> > module
>
> Numarray essentially did this.   I think we still need a C-type object 
> for arrays.
> 

Yup.  I understand and appreciate your attention to performance.  For small
arrays, it's tough to argue that a C implementation won't win.

At the time, all I really needed was something to store and casually
inspect/manipulate my odd ball data (large arrays of complex short) without
converting to a larger representation.  We have something very similar to
weave.inline that I used when it came time to go fast.

>
> I read the PEP again, and agree with Scott that it
> is quite good and would fit what we need quite well.
>
> I say let's resurrect it and push it forward.
>

Very cool.  I hope it does what you need and makes it into the core.  With
your enthusiasm, I wish I had time to finish or at least help with the
implementation.  Unfortunately, I'm more swamped at work now than I was
when I dropped the ball on this the first time.

>
> Scott, do you have any left-over code you could contribute?
>

I'll try and find what I had, but I probably don't have too much that
you'll find much more valuable than the memory object from Numarray.  I
remember I went through a bit of pain to implement the "new style classes"
correctly, but the pickling stuff in the core of the Python library is
where the real challenge is, and I never got going on the TeX docs or unit
tests that would be necessary for acceptance.

Cheers,
    -Scott