[Numpy-discussion] Latest Array-Interface PEP

Mon Jan 8 22:57:32 EST 2007

On 1/6/07, Travis Oliphant <oliphant at ee.byu.edu> wrote:
>
> Tim Hochberg wrote:
> > Christopher Barker wrote:
> >
> > [SNIP]
> >
> >> I think the PEP has far more chances of success if it's seen as a
> >> request from a variety of package developers, not just the numpy crowd
> >> (which, after all, already has numpy
> >>
> > This seems eminently sensible. Getting a few developers from other
> > projects on board would help a lot; it might also reveal some
> > deficiencies to the proposal that we don't see yet.
> >
> It would help quite a bit.  Are there any suggestions of who to recruit
> to review the proposal?

Before I can answer that, I need to ask you a question. How do you see this
extension to the buffer protocol? Do you see it as an supplement to the
earlier array protocol, or do you see it as a replacement?

The reason that I ask is that the two projects that I use regularly are
wxPython and PIL generally operate on relatively large data chunks and it's
not clear that they would see much benefit over this mechanism versus the
array protocol.

I imagine that between us Chris Barker and I could hack together something
for wxPython (not that I've asked him aout it). And code would probably go a
long way to convincing people what a great idea this is. However, all else
being equal, it'd be a lot easier to do this for the array protocol since
there's no extra infrastructure involved.

[SNIP]

>          1. Why do we need Py_ARRAYOF? Can't we get the same effect just
> >             using longer shape and strides arrays?
> >
> Yes, this is true for a single data-format in isolation (and in fact
> exactly what you get when you instantiate in NumPy a data-type that is
> an array of another primitive data-type).   However, how do you describe
> a structure whose second field is an array of a primitive type?  This is
> where the ARRAYOF qualifier is needed.  In NumPy, actually, it's not
> done this way, but a separate subarray field in the data-type object is
> used.  After studying c-types,  however, I think this approach is better.

OK,. Needed for recursive data structures, check.

>          2. Is there any type besides Py_STRUCTURE that can have names
> >             and fields. If so, what and what do they mean. If not, you
> >             should just say that.
> >
> Yes, you can add fields to a multi-byte primitive if you want.  This
> would be similar to thinking about the data-format as a C-like union.
> Perhaps the data-field has meaning as a 4-byte integer but the
> most-significant and least-significant bytes should also be addressable
> individually.

Hmm. I think I understand this somewhat better now, but I can't decide if
it's cool or overkill. Is this a supporting a feature that ctypes has?

>          3. And on this topic, why a tuple of ([names,..], {field})? Why
> >             not simply a list of (name, dfobject, offset, meta) for
> >             example? And what's the meta information if it's not PyNone?
> >             Just a string? Anything at all?
> >
>
> The list of names is useful for having an ordered list so you can
> traverse the structure in field order.   It is technically not necessary
> but it makes it a lot easier to parse a data-format object in offset
> order (it is used a bit in NumPy, for example).

Right, I got that. Between names and field you are simulating an ordered
dict. What I still don't understand is why you chose to simulate this
ordered dict using a list plus a dictionary rather than a list of tuples.
This may well just be a matter of taste. However, for the small sizes I'd
expect of these lists I would expect a list of of tuples would perform
better than the dictionary solution.

The meta information is a place holder for field tags and future growth
> (kind of like column headers in a spreadsheet).  It started as a place
> to put a "longer" name or to pass along information about a field (like
> units) through.
>
> OK.

FWIW, the array protocol PEP seems more relevant to what I do since I'm not
concerned so much with the overhead since I'm sending big chunks of data
back and forth.

-tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20070108/20b77bff/attachment.html>