[Numpy-discussion] question about optimizing

Charles R Harris charlesr.harris at gmail.com
Sat May 17 15:45:53 EDT 2008


On Sat, May 17, 2008 at 1:18 PM, Anne Archibald <peridot.faceted at gmail.com>
wrote:

> 2008/5/17 Brian Blais <bblais at bryant.edu>:
>
> > at least for me, that was the motivation.  I am trying to build a
> simulation
> > framework for part of the brain, which requires connected layers of
> nodes.
> >  A layer is either a 1D or 2D structure of nodes, with each node a
> > relatively complex beast.  Rather than reinvent the indexing (1D, 2D,
> > slicing, etc...), I just inherited from ndarray.  I thought, after the
> fact,
> > that some numpy functions on arrays would help speed up the code, which
> > consists mostly of calling an update function on all nodes, passing each
> > them an input vector.  I wasn't sure if there would be any speed up for
> > this, compared to
> > for n in self.flat:
> >    n.update(input_vector)
> > From the response, the answer seems to be no, and that I should stick
> with
> > the python loops for clarity.  But also, the words of Anne Archibald,
> makes
> > me think that I have made a bad choice by inheriting from ndarray,
> although
> > I am not sure what a convenient alternative would be.
>
> Well, it doesn't exist yet, but a handy tool would be a factory
> function "ArrayOf"; you would pass it a class, and it would produce a
> subclass of ndarray designed to contain that class.


Subclasses should generally be avoided unless they satisfy the "is a"
criterion, which I don't think a matrix stack does. That is to say, a
subclass should behave as an ndarray in *all* ways except for added
functionality. All we want is an item type with dimensions.

That is, the
> underlying storage would be a record array, but the getitem and
> setitem would automatically handle conversion to and from the class
> you supplied it, where appropriate.


I don't think getitem and setitem should be overloaded in a subclass, that
is where the matrix class went wrong. Those methods should not be considered
"virtual". So if you have an array of special elements, then you are allowed
a[0][i, j], but you aren't allowed  a[0, i, j]. I think this has the virtue
of sense and consistency: you can overload the operators of the type anyway
you please, but you can't overload the operators of ndarray.

I suppose we could specify which operators in ndarray *could* be considered
virtual. That, after all, is what designing a base class is all about. But
we haven't done that.


>
> myarray = ArrayOf(Node,dtype=...)
> A = myarray.array([Node(...), Node(...), Node(...)])
> n = A[1]
> A[2] = Node(...)
> A.C.update() # python-loop-based update of all elements
>
> You could also design it so that it was easy to derive a class from
> it, since that's probably the best way to handle vectorized methods:
>
> class myarray(ArrayOf(Node, dtype=...)):
>    def update(self):
>        self.underlying["node_attribute"] += 1
>
>
> I should say, if you can get away with treating your nodes more like C
> structures and writing (possibly vectorized) functions to act on them,
> you can avoid all this mumbo jumbo:
>

Yes, what we want is something like an object array with efficient
contiguous storage.  Record types are a step in that direction, they just
aren't complete enough. Hmm, this is an argument for using methods instead
of functions, so that you could sort on the columns of the matrices in a
stack by doing something like a[...].sort(axis=1).


> node_dtype = [("node_attribute",np.int),("weight", np.float)]
> A = np.zeros(10,dtype=node_dtype)
>
> def nodes_update(A):
>    A["node_attribute"] += 1
>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080517/2d36cdcc/attachment.html>


More information about the NumPy-Discussion mailing list