On 1/6/07, Travis Oliphant <oliphant@ee.byu.edu> wrote:
Tim Hochberg wrote:
> Christopher Barker wrote:
>
> [SNIP]
>
>> I think the PEP has far more chances of success if it's seen as a
>> request from a variety of package developers, not just the numpy crowd
>> (which, after all, already has numpy
>>
> This seems eminently sensible. Getting a few developers from other
> projects on board would help a lot; it might also reveal some
> deficiencies to the proposal that we don't see yet.
>
It would help quite a bit.  Are there any suggestions of who to recruit
to review the proposal?

Before I can answer that, I need to ask you a question. How do you see this extension to the buffer protocol? Do you see it as an supplement to the earlier array protocol, or do you see it as a replacement?

The reason that I ask is that the two projects that I use regularly are wxPython and PIL generally operate on relatively large data chunks and it's not clear that they would see much benefit over this mechanism versus the array protocol.

I imagine that between us Chris Barker and I could hack together something for wxPython (not that I've asked him aout it). And code would probably go a long way to convincing people what a great idea this is. However, all else being equal, it'd be a lot easier to do this for the array protocol since there's no extra infrastructure involved.

[SNIP] 

>          1. Why do we need Py_ARRAYOF? Can't we get the same effect just
>             using longer shape and strides arrays?
>
Yes, this is true for a single data-format in isolation (and in fact
exactly what you get when you instantiate in NumPy a data-type that is
an array of another primitive data-type).   However, how do you describe
a structure whose second field is an array of a primitive type?  This is
where the ARRAYOF qualifier is needed.  In NumPy, actually, it's not
done this way, but a separate subarray field in the data-type object is
used.  After studying c-types,  however, I think this approach is better.

OK,. Needed for recursive data structures, check.
 

>          2. Is there any type besides Py_STRUCTURE that can have names
>             and fields. If so, what and what do they mean. If not, you
>             should just say that.
>
Yes, you can add fields to a multi-byte primitive if you want.  This
would be similar to thinking about the data-format as a C-like union.
Perhaps the data-field has meaning as a 4-byte integer but the
most-significant and least-significant bytes should also be addressable
individually.

Hmm. I think I understand this somewhat better now, but I can't decide if it's cool or overkill. Is this a supporting a feature that ctypes has?
 

>          3. And on this topic, why a tuple of ([names,..], {field})? Why
>             not simply a list of (name, dfobject, offset, meta) for
>             example? And what's the meta information if it's not PyNone?
>             Just a string? Anything at all?
>

The list of names is useful for having an ordered list so you can
traverse the structure in field order.   It is technically not necessary
but it makes it a lot easier to parse a data-format object in offset
order (it is used a bit in NumPy, for example).

Right, I got that. Between names and field you are simulating an ordered dict. What I still don't understand is why you chose to simulate this ordered dict using a list plus a dictionary rather than a list of tuples. This may well just be a matter of taste. However, for the small sizes I'd expect of these lists I would expect a list of of tuples would perform better than the dictionary solution.
 

The meta information is a place holder for field tags and future growth
(kind of like column headers in a spreadsheet).  It started as a place
to put a "longer" name or to pass along information about a field (like
units) through.

OK.

FWIW, the array protocol PEP seems more relevant to what I do since I'm not concerned so much with the overhead since I'm sending big chunks of data back and forth.

-tim