Tim Hochberg wrote:
Christopher Barker wrote:
[SNIP]
I think the PEP has far more chances of success if it's seen as a request from a variety of package developers, not just the numpy crowd (which, after all, already has numpy
This seems eminently sensible. Getting a few developers from other projects on board would help a lot; it might also reveal some deficiencies to the proposal that we don't see yet.
It would help quite a bit. Are there any suggestions of who to recruit to review the proposal? We should not forget that the NumPy world is quite diverse as well.
I've only given the PEP a quick read through at this point, but here a couple of comments:
Thank you for taking the time to read through it. I know it takes precious effort to do all this, which is why it's been so slow in coming from my end. It is important to get a lot of discussion on something like this. A lot of what is in the PEP does stem from a lot of discussion that's happened in the past 10 years, but admittedly some of it doesn't (extended data-format descriptions for example.).
1. It seems very numpy-centric. That's not necessarily bad, but I think it would help to have some outsiders look it over -- perhaps they would see things that they need that it doesn't address. Conversely, there may universal opinion that some parts of it aren't needed, and we can strip the proposal down somewhat.
Yes, this is true. I took the struct module, NumPy, and c-types as a guide for "what is needed" to be described in terms of memory.
2. It seems pretty complicated. In particular, the PyDataFormatObject seems pretty complicated. This part in particular seems like it might be a hard sell, so I expect this is going to need considerable more motivation. For example:
Yes, the PyDataFormatObject is complicated --- but I don't think un-necessarily so. I've stripped a lot of it away from what's in NumPy to reduce it already. The question really is how are you going to describe what an arbitrary chunk of memory represents. One could restrict it to primitive types and replace the PyDataFormatObject with the enumerated typed and just give up on describing more complicated structures. But, my contention is why? Numarray and NumPy and C-types have already laid a tremendous amount of groundwork in how we can represent complicated data-structures. They clearly exist so why shouldn't we have some mechansim to describe them. Once you decide to handle complicated types you need to replace the simple enumerated type with something that is "self-recursive" (i.e. so you can have fields of arbitrary data-types). This lends itself to some-kind of structure design like the PyDataFormatObject. The only difference in what I've proposed to the c-types approach is that c-types over-loads Python Type Objects. (In other-words the PyDataFormatObject equivalent in c-types is at it's core a PyTypeObject while here it is built on PyObject).
1. Why do we need Py_ARRAYOF? Can't we get the same effect just using longer shape and strides arrays?
Yes, this is true for a single data-format in isolation (and in fact exactly what you get when you instantiate in NumPy a data-type that is an array of another primitive data-type). However, how do you describe a structure whose second field is an array of a primitive type? This is where the ARRAYOF qualifier is needed. In NumPy, actually, it's not done this way, but a separate subarray field in the data-type object is used. After studying c-types, however, I think this approach is better.
2. Is there any type besides Py_STRUCTURE that can have names and fields. If so, what and what do they mean. If not, you should just say that.
Yes, you can add fields to a multi-byte primitive if you want. This would be similar to thinking about the data-format as a C-like union. Perhaps the data-field has meaning as a 4-byte integer but the most-significant and least-significant bytes should also be addressable individually.
3. And on this topic, why a tuple of ([names,..], {field})? Why not simply a list of (name, dfobject, offset, meta) for example? And what's the meta information if it's not PyNone? Just a string? Anything at all?
The list of names is useful for having an ordered list so you can traverse the structure in field order. It is technically not necessary but it makes it a lot easier to parse a data-format object in offset order (it is used a bit in NumPy, for example). The meta information is a place holder for field tags and future growth (kind of like column headers in a spreadsheet). It started as a place to put a "longer" name or to pass along information about a field (like units) through. -Travis