[Numpy-discussion] Generator arrays
oliphant at enthought.com
Fri Jan 28 16:18:46 EST 2011
Thanks for the long email. I think there are a lot of thoughts around some of these ideas and it is good to get as many of them articulated as possible.
I learn much from these kinds of discussions. I think others value them as well.
I like your ideas about what kind of overloading hooks, subclasses of ndarray's should really be allowed to over-write.
One thing I didn't talk about in my previous long email, was the re-organization of calculation functions that needs to happen. I really think that the ufunc concept needs to be broadened so that all function-pointers that are currently attached to data-type objects can be handled under the same calculation super-structure.
This re-factoring would go a long way into cementing what kind of API is needed for different "array objects". I am persuaded that improving framework for vectorized calculations which allow for any array-like objects (objects satisfying a certain protocol or API) is a better approach then altering the nice map of ndarray to in-memory data.
Then, deferred arrays, masked arrays, computed arrays, and other array-like objects could provide protocols and APIs (and callbacks) that satisfy this general calculation structure. This kind of generalization is probably more useful than changes to the array object itself.
But, it's also hard and I'm not entirely sure what that structure should be. I'm looking forward to thoughts in this direction and looking more closely at what Mark has done with writing ufuncs as wrappers around his iterators. I'm concerned that his iterators don't support the generalized ufunc interface that I really like and was hoping would provide the abstraction needed to allow all the functions currently attached to dtypes (searchsorted, etc.) to be incorporated in the generalized calculation structure.
On Jan 28, 2011, at 9:25 AM, Lluís wrote:
> Travis Oliphant writes:
>> This concept has as one use-case, the deferred arrays that Mark Wiebe
>> has proposed.
> Interesting, I didn't read about that.
> In fact, I was playing around with a proxy wrapper for ndarrays not long
> ago, in order to build a tree of deferred operations that can be later
> optimized through numexpr once __str__ or __repr__ is called on such a
> deferred object. The idea was to have something like:
> a = np.array(...)
> a = defereval(a) # returns a proxy wrapper for known methods of np.ndarray
> b = 10 + a ** 2
> print a # here the tree of deferred operations is flattened
> # into a string that numpexpr can use
> I didn't play much with it, but proxying all methods but __str__ and
> __repr__ (thus iterating on the original a.__dict__) seemed to suffice.
> The benefits I see of building this into ndarray itself is that ndarray
> would then be the hourglass waist of the framework.
> Subclassing ndarray is moderately complex right now, so I think that
> having a way to move some of these subclasses below the hourglass waist
> and not having to deal with the overloading of ndarray's UI would be a
> big step forward towards extension code simplicity.
> So, having near-zero knowledge on the internals of numpy and all new
> features that have been discussed here, my naive view of what the stack
> should contain is:
> * ndarray subclasses
> Overload indexing (e.g., data_array's named dimension elements),
> translating any fancy indexing into ndarray's "native" indexing
> Overload user representation (e.g., show some extra info when printing
> an array)
> * ndarray slicing and numeric operations
> A central point for slicing/indexing (the output should be either
> views or copies)
> A central point to control the deferral of operations (both native and
> extensions - see below -). In fact, I see deferred operations as just
> a form of copy-on-write/evaluate-on-access views (COW must be used
> when one of the input operands of a deferred tree of operations is
> modified after capturing it into such a tree).
> * numeric operations extensions
> Numeric operations should be first-class if deferred operation
> evaluation is to be taken to its highest potential, and thus they
> should be aware of an "operation evaluation engine" (as well as the
> other way around).
> If they are not (and they should be able not to be), two things can
> - for those based only on first-class operations, it is just the root
> of a subtree
> - if more complex operations are performed (explicit looping?), they
> simply diminish the range of possibilities of optimizing opearation
> evaluation (actually producing multiple evaluation trees, or maybe
> simply forcing evaluation).
> * operation evaluation engine
> This would take care of evaluating the operation tree, while
> performing optimizations on it.
> Fortunately, if a sensible interface is established between this and
> first-class numeric operations, a first implementation can provide
> just the naive evaluation, and further optimizations can be provided
> behind the scenes.
> Such optimizations would provide things like operation tree
> simplification/reorganization, blocking (a la numexpr) and
> parallellization of computations.
> * storage access extensions
> Slicing in ndarray should be aware of objects represented by means
> other than "plain strided memory buffers": e.g., the compressed array
> case (where decompression could be treated with a sliding window), or
> deferred operation evaluation itself.
> In fact, as you pointed of with the MEMORY flag, both storage and
> operation evaluation can be subject to the common concept of deferral
> (accessing a compressed array is just another form of accessing
> computed contents, like accessing elements on a deferred array).
> I just hope they're all not just obvious observations of what has
> already been said.
> PS: sorry for the unnecessarily long mail
> "And it's much the same thing with knowledge, for whenever you learn
> something new, the whole world becomes that much richer."
> -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
oliphant at enthought.com
More information about the NumPy-Discussion