Timothy Hochberg wrote:
On 1/6/07, *Travis Oliphant* <oliphant@ee.byu.edu mailto:oliphant@ee.byu.edu> wrote:
Tim Hochberg wrote: > Christopher Barker wrote: > > [SNIP] > >> I think the PEP has far more chances of success if it's seen as a >> request from a variety of package developers, not just the numpy crowd >> (which, after all, already has numpy >> > This seems eminently sensible. Getting a few developers from other > projects on board would help a lot; it might also reveal some > deficiencies to the proposal that we don't see yet. > It would help quite a bit. Are there any suggestions of who to recruit to review the proposal?
Before I can answer that, I need to ask you a question. How do you see this extension to the buffer protocol? Do you see it as an supplement to the earlier array protocol, or do you see it as a replacement?
This is a replacement to the previously described array protocol PEP. This is how I'm trying to get the array protocol into Python.
In that vein, it has two purposes:
One is to make a better buffer protocol that includes a conception of an N-dimensional array in Python itself. If we can include this in Python then we get a lot of mileage out of all the people that write extension modules for Python that should really be making their memory available as an N-dimensional array (everytime I turn around there is a new wrapping to some library that is *not* using NumPy as the underlying extension). With the existence of ctypes it just starts to get worse as nobody thinks about exposing things as arrays anymore and so NumPy users don't get the ease of use we would get if the N-dimensional array concept were a part of Python itself.
For example, I just found the FreeImage project which wraps a nice library using ctypes. But, it doesn't have a way to expose these images as numpy arrays. Now, it would probably take me only a few hours to make the connection between FreeImage and NumPy, but I'd like to see the day when it happens without me (or some other NumPy expert) having to do all the work. If ctypes objects exposed the extended buffer protocol for appropriate types, then I wouldn't have to do anything. Because the wrapped structures would be exposable as arrays and all of a sudden I say
a = array(freeimobj)
and I can do math on the array in Python.
Or if I'm an extension module writer, I don't need to have NumPy (or rely on it) in order to do some computation on freeimobj in C itself.
Sure, you can do it now (if the array protocol is followed --- but not many people have adopted it yet --- some have argued that it's "not in Python itself"). So, I guess, the big reason I'm pushing this is largely marketing.
The buffer protcol is the "right" place to but the array protocol.
The second reason is to ensure that the buffer protocol itself doesn't "disappear" in Python 3000. Not all the Python devs seem to really see the value of it. But, it can sometimes be unclear as to what the attitudes are.
> 2. Is there any type besides Py_STRUCTURE that can have names > and fields. If so, what and what do they mean. If not, you > should just say that. > Yes, you can add fields to a multi-byte primitive if you want. This would be similar to thinking about the data-format as a C-like union. Perhaps the data-field has meaning as a 4-byte integer but the most-significant and least-significant bytes should also be addressable individually.
Hmm. I think I understand this somewhat better now, but I can't decide if it's cool or overkill. Is this a supporting a feature that ctypes has?
I don't know. It's basically a situation where it's easier to support it than to not and so it's there.
> 3. And on this topic, why a tuple of ([names,..], {field})? Why > not simply a list of (name, dfobject, offset, meta) for > example? And what's the meta information if it's not PyNone? > Just a string? Anything at all? > The list of names is useful for having an ordered list so you can traverse the structure in field order. It is technically not necessary but it makes it a lot easier to parse a data-format object in offset order (it is used a bit in NumPy, for example).
Right, I got that. Between names and field you are simulating an ordered dict. What I still don't understand is why you chose to simulate this ordered dict using a list plus a dictionary rather than a list of tuples. This may well just be a matter of taste. However, for the small sizes I'd expect of these lists I would expect a list of of tuples would perform better than the dictionary solution.
Ah. I misunderstood. You are right that if I had considered needing an ordered list of names up front, this kind of thing makes more sense. I think the reason for the choice of dictionary is that I was thinking of field access as attribute look-up which is just dictionary look-up. So, conceptually that was easier for me. But, tuples are probably less over-head (especially for small numbers of fields) with the expense of having to search for the field-name on field access.
But, I'm trusting that dictionaries (especially small ones) are pretty optimized in Python (I haven't tested that assertion in this particular case, though).
FWIW, the array protocol PEP seems more relevant to what I do since I'm not concerned so much with the overhead since I'm sending big chunks of data back and forth.
This proposal is trying to get the array protocol *into* Python. So, this is the array protocol PEP. Anyone supportive of the array protocol should be interested in and thinking about this PEP.
-Travis