Re: [Numpy-discussion] question about creating numpy arrays

20 May 2010


      On Thu, May 20, 2010 at 10:44 AM, Benjamin Root  wrote:
...
...
I gave two counterexamples of why.
The examples you gave aren't counterexamples.  See below...
I'm not interested in arguing over semantics. I've discovered an issue
with how numpy deals with lists of objects that derive from ndarray,
and am concerned about the implications for classes that extend
ndarray.
...
On Wed, May 19, 2010 at 7:06 PM, Darren Dale  wrote:
...
On Wed, May 19, 2010 at 4:19 PM,   wrote:
...
On Wed, May 19, 2010 at 4:08 PM, Darren Dale  wrote:
...
I have a question about creation of numpy arrays from a list of
objects, which bears on the Quantities project and also on masked
arrays:
...
...
> import quantities as pq
> import numpy as np
> a, b = 2*pq.m,1*pq.s
> np.array([a, b])
array([ 12.,   1.])
Why doesn't that create an object array? Similarly:
Consider the use case of a person creating a 1-D numpy array:
 > np.array([12.0, 1.0])
array([ 12.,  1.])
How is python supposed to tell the difference between
 > np.array([a, b])
and
 > np.array([12.0, 1.0])
?
It can't, and there are plenty of times when one wants to explicitly
initialize a small numpy array with a few discrete variables.
...
...
...
...
...
> m = np.ma.array([1], mask=[True])
> m
masked_array(data = [--],
            mask = [ True],
      fill_value = 999999)
...
...
> np.array([m])
array([[1]])
Again, this is expected behavior.  Numpy saw an array of an array,
therefore, it produced a 2-D array. Consider the following:
 > np.array([[12, 4, 1], [32, 51, 9]])
I, as a user, expect numpy to create a 2-D array (2 rows, 3 columns) from
that array of arrays.
...
...
...
This has broader implications than just creating arrays, for example:
...
...
> np.sum([m, m])
2
> np.sum([a, b])
13.0
If you wanted sums from each object, there are some better (i.e., more
clear) ways to go about it.  If you have a predetermined number of
numpy-compatible objects, say a, b, c, then you can explicitly call the sum
for each one:
 > a_sum = np.sum(a)
 > b_sum = np.sum(b)
 > c_sum = np.sum(c)
Which I think communicates the programmer's intention better than (for a
numpy array, x, composed of a, b, c):
 > object_sums = np.sum(x)       # <--- As a numpy user, I would expect a
scalar out of this, not an array
If you have an arbitrary number of objects (which is what I suspect you
have), then one could easily produce an array of sums (for a list, x, of
numpy-compatible objects) like so:
 > object_sums = [np.sum(anObject) for anObject in x]
Performance-wise, it should be no more or less efficient than having numpy
somehow produce an array of sums from a single call to sum.
Readability-wise, it makes more sense because when you are treating objects
separately, a *list* of them is more intuitive than a numpy.array, which is
more-or-less treated as a single mathematical entity.
I hope that addresses your concerns.
I appreciate the response, but you are arguing that it is not a
problem, and I'm certain that it is. It may not be numpy