Re: [Numpy-discussion] Questions about the array interface.
--- Chris Barker <Chris.Barker@noaa.gov> wrote:
Again, I'm uncomfortable with something that I have to check being optional. If it is, we're encouraging people to not check it, and that' a recipe for bugs later on down the road.
[snip]
I guess all I'm saying is that I wouldn't assume the offset is zero...
Good point. All the more reason to have the offset be mandatory.
Lot's of protocols have optional parts. The helper functions would hide this level of detail.
Yes, if there is a C/C++ version of all these helper functions, I'll be a lot happier. And you're right, the same information should not be encoded in two places, so my "iscontiguous" attribute should be a helper function or maybe a method.
In a short while, you shouldn't have to check any __array_metadata__ attributes directly. There should even be a helper function for getting the array elements.
Cool. How would that work? A C++ iterator? I"m thinking not, as this is all C, no?
I think this will take shape as an include file with static/inline functions. No linking required, just #include <ndarray.h> and call the functions. It would be nice but not necessary that this was distributed with Python. I would be in favor of having some C++ iterator interfaces (possibly a template class) inside of a #ifdef __cplusplus block. Python doesn't seem to have a a lot C++ in the core so I wonder if this would meet resistance (even when it's inside of a #ifdef block).
It wouldn't be a horrible mistake to have all the attributes be mandatory, but it doesn't get array consumes any benefit that they can't get from a well written helper library, and it does add some burden to array producers.
Hardly any. I'm assuming that there will be a base_array class that can be used as a base class or mixin, so it wouldn't be any work at all to have a full set of attributes with defaults. It would take up a little bit of memory. I'm assuming that the whole point of this is to support large datasets, but maybe that isn't a valid assumption, After all, small array support has turned out to be very important for Numeric.
If the protocol can make things easy without the use of a mixin or base class, all the better to my way of thinking. I don't think the memory use is very relevant as the attributes would only require storage in the class object, not the instances. There is something elegant about making array creation as easy as: class easy_array: def __init__(self, filename): data = open(filename, 'r').read() self.__array_data__ = data self.__array_shape__ = (len(data)/4,) self.__array_typestr__ = '>i4' Like I said, I don't think it would be *horrible* to require all the attributes, but I don't see how it will benefit you at all. And even if all the attributes are mandatory, there are still a number of details to get right in reading the memory. You'll likely want to use the helper libraries/modules regardless. (Once they're completed of course...)
As a rule of thumb, I think there will be [more] consumers of arrays than producers, so I'd rather make it easy on the consumers that the producers, if we need to make such a trade off. Maybe I'm biased, because I'm a consumer.
I don't see the trade off. It will be easy for you either way, but harder for array producers (admittedly only a little). This has to be easier than the situation you have today right? Imagine the code you'd have to write to special case Numeric, scipy.base, Numarray, and Python's array module. Cheers, -Scott
Scott Gilbert wrote:
--- Chris Barker <Chris.Barker@noaa.gov> wrote:
[SNIP]
As a rule of thumb, I think there will be [more] consumers of arrays than producers, so I'd rather make it easy on the consumers that the producers, if we need to make such a trade off. Maybe I'm biased, because I'm a consumer.
I don't see the trade off. It will be easy for you either way, but harder for array producers (admittedly only a little).
I think there is a trade off, but not the one that Chris is worried about. It should be easy to hide complexity of dealing with missing attributes through the various helper functions. The cost will be in speed and will probably be most noticable in C extensions using small arrays where the extra code to check if an attribute is present will be signifigant. How signifigant this will be, I'm not sure. And frankly I don't care all that much since I generally only use large arrays. However, since one of the big faultlines between Numarray and Numeric involves the former's relatively poor small array performance, I suspect someone might care. -tim
This has to be easier than the situation you have today right? Imagine the code you'd have to write to special case Numeric, scipy.base, Numarray, and Python's array module.
Tim Hochberg wrote:
Scott Gilbert wrote:
--- Chris Barker <Chris.Barker@noaa.gov> wrote:
I don't see the trade off.
I wasn't sure it applied in this case, but if there were a trade off, we should make things easiest for the consumers of arrays.
I think there is a trade off, but not the one that Chris is worried about. It should be easy to hide complexity of dealing with missing attributes through the various helper functions. The cost will be in speed and will probably be most noticable in C extensions using small arrays where the extra code to check if an attribute is present will be signifigant.
Actually, that is one I'm worried about. You're quite right, if I'm dealing with a 2X2 array, those helper functions are going to take much longer to run than accessing (and maybe using) the data. Like Tim, I'm mostly interested in using this for large data sets, but I think the small array thing might crop up unexpectedly. For example, with the current numarray, if you pass in an NX2 array to wxPython (to draw a polygon, for instance), it's very slow. It turns out that that's because a whole set of (2,) arrays are created when extracting the data, so even though you're dealing with a large data set, you end up dealing with a LOT of small arrays. Of course, the whole point of this is to avoid that, but I don't think we should assume that any overhead is negligible.
This has to be easier than the situation you have today right?
well, sure. Though it seems to be harder than using the Numeric API. Directly. However, I'll shut up now, as it seems that the proposed utility functions will address my issues. -Chris PS to Tim: Want to help out with the wxPython integration? -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (3)
-
Chris Barker -
Scott Gilbert -
Tim Hochberg