question about creating numpy arrays
I have a question about creation of numpy arrays from a list of objects, which bears on the Quantities project and also on masked arrays:
import quantities as pq import numpy as np a, b = 2*pq.m,1*pq.s np.array([a, b]) array([ 12., 1.])
Why doesn't that create an object array? Similarly:
m = np.ma.array([1], mask=[True]) m masked_array(data = [--], mask = [ True], fill_value = 999999)
np.array([m]) array([[1]])
This has broader implications than just creating arrays, for example:
np.sum([m, m]) 2 np.sum([a, b]) 13.0
Any thoughts? Thanks, Darren
On Wed, May 19, 2010 at 4:08 PM, Darren Dale <dsdale24@gmail.com> wrote:
I have a question about creation of numpy arrays from a list of objects, which bears on the Quantities project and also on masked arrays:
import quantities as pq import numpy as np a, b = 2*pq.m,1*pq.s np.array([a, b]) array([ 12., 1.])
Why doesn't that create an object array? Similarly:
m = np.ma.array([1], mask=[True]) m masked_array(data = [--], mask = [ True], fill_value = 999999)
np.array([m]) array([[1]])
This has broader implications than just creating arrays, for example:
np.sum([m, m]) 2 np.sum([a, b]) 13.0
Any thoughts?
These are "array_like" of floats, so why should it create anything else than an array of floats. It's the most common usecase "array_like" is the most popular type for parameters in the docstrings Josef
Thanks, Darren _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, May 19, 2010 at 4:19 PM, <josef.pktd@gmail.com> wrote:
On Wed, May 19, 2010 at 4:08 PM, Darren Dale <dsdale24@gmail.com> wrote:
I have a question about creation of numpy arrays from a list of objects, which bears on the Quantities project and also on masked arrays:
import quantities as pq import numpy as np a, b = 2*pq.m,1*pq.s np.array([a, b]) array([ 12., 1.])
Why doesn't that create an object array? Similarly:
m = np.ma.array([1], mask=[True]) m masked_array(data = [--], mask = [ True], fill_value = 999999)
np.array([m]) array([[1]])
This has broader implications than just creating arrays, for example:
np.sum([m, m]) 2 np.sum([a, b]) 13.0
Any thoughts?
These are "array_like" of floats, so why should it create anything else than an array of floats.
I gave two counterexamples of why.
I gave two counterexamples of why.
The examples you gave aren't counterexamples. See below... On Wed, May 19, 2010 at 7:06 PM, Darren Dale <dsdale24@gmail.com> wrote:
On Wed, May 19, 2010 at 4:19 PM, <josef.pktd@gmail.com> wrote:
On Wed, May 19, 2010 at 4:08 PM, Darren Dale <dsdale24@gmail.com> wrote:
I have a question about creation of numpy arrays from a list of objects, which bears on the Quantities project and also on masked arrays:
import quantities as pq import numpy as np a, b = 2*pq.m,1*pq.s np.array([a, b]) array([ 12., 1.])
Why doesn't that create an object array? Similarly:
Consider the use case of a person creating a 1-D numpy array:
np.array([12.0, 1.0]) array([ 12., 1.])
How is python supposed to tell the difference between
np.array([a, b]) and np.array([12.0, 1.0]) ?
It can't, and there are plenty of times when one wants to explicitly initialize a small numpy array with a few discrete variables.
m = np.ma.array([1], mask=[True]) m masked_array(data = [--], mask = [ True], fill_value = 999999)
np.array([m]) array([[1]])
Again, this is expected behavior. Numpy saw an array of an array, therefore, it produced a 2-D array. Consider the following:
np.array([[12, 4, 1], [32, 51, 9]])
I, as a user, expect numpy to create a 2-D array (2 rows, 3 columns) from that array of arrays.
This has broader implications than just creating arrays, for example:
np.sum([m, m]) 2 np.sum([a, b]) 13.0
If you wanted sums from each object, there are some better (i.e., more clear) ways to go about it. If you have a predetermined number of numpy-compatible objects, say a, b, c, then you can explicitly call the sum for each one:
a_sum = np.sum(a) b_sum = np.sum(b) c_sum = np.sum(c)
Which I think communicates the programmer's intention better than (for a numpy array, x, composed of a, b, c):
object_sums = np.sum(x) # <--- As a numpy user, I would expect a scalar out of this, not an array
If you have an arbitrary number of objects (which is what I suspect you have), then one could easily produce an array of sums (for a list, x, of numpy-compatible objects) like so:
object_sums = [np.sum(anObject) for anObject in x]
Performance-wise, it should be no more or less efficient than having numpy somehow produce an array of sums from a single call to sum. Readability-wise, it makes more sense because when you are treating objects separately, a *list* of them is more intuitive than a numpy.array, which is more-or-less treated as a single mathematical entity. I hope that addresses your concerns. Ben Root
On Thu, May 20, 2010 at 9:44 AM, Benjamin Root <ben.root@ou.edu> wrote:
I gave two counterexamples of why.
The examples you gave aren't counterexamples. See below...
On Wed, May 19, 2010 at 7:06 PM, Darren Dale <dsdale24@gmail.com> wrote:
On Wed, May 19, 2010 at 4:19 PM, <josef.pktd@gmail.com> wrote:
On Wed, May 19, 2010 at 4:08 PM, Darren Dale <dsdale24@gmail.com> wrote:
I have a question about creation of numpy arrays from a list of objects, which bears on the Quantities project and also on masked arrays:
> import quantities as pq > import numpy as np > a, b = 2*pq.m,1*pq.s > np.array([a, b]) array([ 12., 1.])
Why doesn't that create an object array? Similarly:
Consider the use case of a person creating a 1-D numpy array: > np.array([12.0, 1.0]) array([ 12., 1.])
How is python supposed to tell the difference between > np.array([a, b]) and > np.array([12.0, 1.0]) ?
It can't, and there are plenty of times when one wants to explicitly initialize a small numpy array with a few discrete variables.
What do you mean it can't? 12.0 and 1.0 are floats, a and b are not. While, yes, they can be coerced to floats, this is a *lossy* transformation--it strips away information contained in the class, and IMHO should not be the default behavior. If I want the objects, I can force it: In [7]: np.array([a,b],dtype=np.object) Out[7]: array([2.0 m, 1.0 s], dtype=object) This works fine, but feels ugly since I have to explicitly tell numpy not to do something. It feels to me like it's violating the principle of "in the face of ambiguity, resist the temptation to guess." Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
On Thu, May 20, 2010 at 10:30 AM, Ryan May <rmay31@gmail.com> wrote:
On Thu, May 20, 2010 at 9:44 AM, Benjamin Root <ben.root@ou.edu> wrote:
I gave two counterexamples of why.
The examples you gave aren't counterexamples. See below...
On Wed, May 19, 2010 at 7:06 PM, Darren Dale <dsdale24@gmail.com> wrote:
On Wed, May 19, 2010 at 4:19 PM, <josef.pktd@gmail.com> wrote:
On Wed, May 19, 2010 at 4:08 PM, Darren Dale <dsdale24@gmail.com>
wrote:
I have a question about creation of numpy arrays from a list of objects, which bears on the Quantities project and also on masked arrays:
>> import quantities as pq >> import numpy as np >> a, b = 2*pq.m,1*pq.s >> np.array([a, b]) array([ 12., 1.])
Why doesn't that create an object array? Similarly:
Consider the use case of a person creating a 1-D numpy array:
np.array([12.0, 1.0]) array([ 12., 1.])
How is python supposed to tell the difference between
np.array([a, b]) and np.array([12.0, 1.0]) ?
It can't, and there are plenty of times when one wants to explicitly initialize a small numpy array with a few discrete variables.
What do you mean it can't? 12.0 and 1.0 are floats, a and b are not. While, yes, they can be coerced to floats, this is a *lossy* transformation--it strips away information contained in the class, and IMHO should not be the default behavior. If I want the objects, I can force it:
In [7]: np.array([a,b],dtype=np.object) Out[7]: array([2.0 m, 1.0 s], dtype=object)
This works fine, but feels ugly since I have to explicitly tell numpy not to do something. It feels to me like it's violating the principle of "in the face of ambiguity, resist the temptation to guess."
I have thought about this further, and I think I am starting to see your point (from both of you). Here are my thoughts: As I understand it, numpy.array() (rather, array_like()) essentially builds the dimensions of the array by first identifying if there is an iterable object, and then if the contents of the iterable is also iterable, until it reaches a non-iterable. Therefore, the question becomes, why is numpy.array() implicitly coercing the non-iterable type into a numeric? Is there some reason that I am not seeing for why there is an implicit coercion? At first glance, I did not see a problem with this behavior, and I have come to expect it (hence my original reply). But now, I am not quite so sure.
Ryan
-- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, May 20, 2010 at 12:13 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Thu, May 20, 2010 at 10:30 AM, Ryan May <rmay31@gmail.com> wrote:
On Thu, May 20, 2010 at 9:44 AM, Benjamin Root <ben.root@ou.edu> wrote:
I gave two counterexamples of why.
The examples you gave aren't counterexamples. See below...
On Wed, May 19, 2010 at 7:06 PM, Darren Dale <dsdale24@gmail.com> wrote:
On Wed, May 19, 2010 at 4:19 PM, <josef.pktd@gmail.com> wrote:
On Wed, May 19, 2010 at 4:08 PM, Darren Dale <dsdale24@gmail.com> wrote:
I have a question about creation of numpy arrays from a list of objects, which bears on the Quantities project and also on masked arrays:
>>> import quantities as pq >>> import numpy as np >>> a, b = 2*pq.m,1*pq.s >>> np.array([a, b]) array([ 12., 1.])
Why doesn't that create an object array? Similarly:
Consider the use case of a person creating a 1-D numpy array: > np.array([12.0, 1.0]) array([ 12., 1.])
How is python supposed to tell the difference between > np.array([a, b]) and > np.array([12.0, 1.0]) ?
It can't, and there are plenty of times when one wants to explicitly initialize a small numpy array with a few discrete variables.
What do you mean it can't? 12.0 and 1.0 are floats, a and b are not. While, yes, they can be coerced to floats, this is a *lossy* transformation--it strips away information contained in the class, and IMHO should not be the default behavior. If I want the objects, I can force it:
In [7]: np.array([a,b],dtype=np.object) Out[7]: array([2.0 m, 1.0 s], dtype=object)
This works fine, but feels ugly since I have to explicitly tell numpy not to do something. It feels to me like it's violating the principle of "in the face of ambiguity, resist the temptation to guess."
I have thought about this further, and I think I am starting to see your point (from both of you). Here are my thoughts:
As I understand it, numpy.array() (rather, array_like()) essentially builds the dimensions of the array by first identifying if there is an iterable object, and then if the contents of the iterable is also iterable, until it reaches a non-iterable.
Therefore, the question becomes, why is numpy.array() implicitly coercing the non-iterable type into a numeric? Is there some reason that I am not seeing for why there is an implicit coercion?
I think because the dtype is numeric (float), otherwise it wouldn't operate on numbers, and none of the other numerical functions might work (just a guess)
a = np.array(['2.0', '1.0'], dtype=object) a array([2.0, 1.0], dtype=object) np.sqrt(a) Traceback (most recent call last): File "<pyshell#31>", line 1, in <module> np.sqrt(a) AttributeError: sqrt np.array([a,a]) array([[2.0, 1.0], [2.0, 1.0]], dtype=object) 2*a array([2.02.0, 1.01.0], dtype=object)
b = np.array(['2.0', '1.0']) np.sqrt(b) NotImplemented np.array([b,b]) array([['2.0', '1.0'], ['2.0', '1.0']], dtype='|S3') 2*b Traceback (most recent call last): File "<pyshell#41>", line 1, in <module> 2*b TypeError: unsupported operand type(s) for *: 'int' and 'numpy.ndarray'
Josef
At first glance, I did not see a problem with this behavior, and I have come to expect it (hence my original reply). But now, I am not quite so sure.
Ryan
-- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, May 20, 2010 at 10:44 AM, Benjamin Root <ben.root@ou.edu> wrote:
I gave two counterexamples of why.
The examples you gave aren't counterexamples. See below...
I'm not interested in arguing over semantics. I've discovered an issue with how numpy deals with lists of objects that derive from ndarray, and am concerned about the implications for classes that extend ndarray.
On Wed, May 19, 2010 at 7:06 PM, Darren Dale <dsdale24@gmail.com> wrote:
On Wed, May 19, 2010 at 4:19 PM, <josef.pktd@gmail.com> wrote:
On Wed, May 19, 2010 at 4:08 PM, Darren Dale <dsdale24@gmail.com> wrote:
I have a question about creation of numpy arrays from a list of objects, which bears on the Quantities project and also on masked arrays:
> import quantities as pq > import numpy as np > a, b = 2*pq.m,1*pq.s > np.array([a, b]) array([ 12., 1.])
Why doesn't that create an object array? Similarly:
Consider the use case of a person creating a 1-D numpy array: > np.array([12.0, 1.0]) array([ 12., 1.])
How is python supposed to tell the difference between > np.array([a, b]) and > np.array([12.0, 1.0]) ?
It can't, and there are plenty of times when one wants to explicitly initialize a small numpy array with a few discrete variables.
> m = np.ma.array([1], mask=[True]) > m masked_array(data = [--], mask = [ True], fill_value = 999999)
> np.array([m]) array([[1]])
Again, this is expected behavior. Numpy saw an array of an array, therefore, it produced a 2-D array. Consider the following:
> np.array([[12, 4, 1], [32, 51, 9]])
I, as a user, expect numpy to create a 2-D array (2 rows, 3 columns) from that array of arrays.
This has broader implications than just creating arrays, for example:
> np.sum([m, m]) 2 > np.sum([a, b]) 13.0
If you wanted sums from each object, there are some better (i.e., more clear) ways to go about it. If you have a predetermined number of numpy-compatible objects, say a, b, c, then you can explicitly call the sum for each one: > a_sum = np.sum(a) > b_sum = np.sum(b) > c_sum = np.sum(c)
Which I think communicates the programmer's intention better than (for a numpy array, x, composed of a, b, c): > object_sums = np.sum(x) # <--- As a numpy user, I would expect a scalar out of this, not an array
If you have an arbitrary number of objects (which is what I suspect you have), then one could easily produce an array of sums (for a list, x, of numpy-compatible objects) like so: > object_sums = [np.sum(anObject) for anObject in x]
Performance-wise, it should be no more or less efficient than having numpy somehow produce an array of sums from a single call to sum. Readability-wise, it makes more sense because when you are treating objects separately, a *list* of them is more intuitive than a numpy.array, which is more-or-less treated as a single mathematical entity.
I hope that addresses your concerns.
I appreciate the response, but you are arguing that it is not a problem, and I'm certain that it is. It may not be numpy
[sorry, my last got cut off] On Thu, May 20, 2010 at 11:37 AM, Darren Dale <dsdale24@gmail.com> wrote:
On Thu, May 20, 2010 at 10:44 AM, Benjamin Root <ben.root@ou.edu> wrote:
I gave two counterexamples of why.
The examples you gave aren't counterexamples. See below...
I'm not interested in arguing over semantics. I've discovered an issue with how numpy deals with lists of objects that derive from ndarray, and am concerned about the implications for classes that extend ndarray.
On Wed, May 19, 2010 at 7:06 PM, Darren Dale <dsdale24@gmail.com> wrote:
On Wed, May 19, 2010 at 4:19 PM, <josef.pktd@gmail.com> wrote:
On Wed, May 19, 2010 at 4:08 PM, Darren Dale <dsdale24@gmail.com> wrote:
I have a question about creation of numpy arrays from a list of objects, which bears on the Quantities project and also on masked arrays:
>> import quantities as pq >> import numpy as np >> a, b = 2*pq.m,1*pq.s >> np.array([a, b]) array([ 12., 1.])
Why doesn't that create an object array? Similarly:
Consider the use case of a person creating a 1-D numpy array: > np.array([12.0, 1.0]) array([ 12., 1.])
How is python supposed to tell the difference between > np.array([a, b]) and > np.array([12.0, 1.0]) ?
It can't, and there are plenty of times when one wants to explicitly initialize a small numpy array with a few discrete variables.
>> m = np.ma.array([1], mask=[True]) >> m masked_array(data = [--], mask = [ True], fill_value = 999999)
>> np.array([m]) array([[1]])
Again, this is expected behavior. Numpy saw an array of an array, therefore, it produced a 2-D array. Consider the following:
> np.array([[12, 4, 1], [32, 51, 9]])
I, as a user, expect numpy to create a 2-D array (2 rows, 3 columns) from that array of arrays.
This has broader implications than just creating arrays, for example:
>> np.sum([m, m]) 2 >> np.sum([a, b]) 13.0
If you wanted sums from each object, there are some better (i.e., more clear) ways to go about it. If you have a predetermined number of numpy-compatible objects, say a, b, c, then you can explicitly call the sum for each one: > a_sum = np.sum(a) > b_sum = np.sum(b) > c_sum = np.sum(c)
Which I think communicates the programmer's intention better than (for a numpy array, x, composed of a, b, c): > object_sums = np.sum(x) # <--- As a numpy user, I would expect a scalar out of this, not an array
If you have an arbitrary number of objects (which is what I suspect you have), then one could easily produce an array of sums (for a list, x, of numpy-compatible objects) like so: > object_sums = [np.sum(anObject) for anObject in x]
Performance-wise, it should be no more or less efficient than having numpy somehow produce an array of sums from a single call to sum. Readability-wise, it makes more sense because when you are treating objects separately, a *list* of them is more intuitive than a numpy.array, which is more-or-less treated as a single mathematical entity.
I hope that addresses your concerns.
I appreciate the response, but you are arguing that it is not a problem, and I'm certain that it is. It may not be numpy
It may not be numpy's problem, I can accept that. But it is definitely a problem for quantities. I'm trying to determine just how big a problem it is. I had hoped that one day quantities might become a part of numpy or scipy, but this appears to be a fundamental issue and it makes me doubt that inclusion would be appropriate. Thank you for the suggestion about calling the sum method instead of numpy's function. That is a reasonable workaround. Darren
On 05/20/2010 10:53 AM, Darren Dale wrote:
[sorry, my last got cut off]
On Thu, May 20, 2010 at 11:37 AM, Darren Dale<dsdale24@gmail.com> wrote:
On Thu, May 20, 2010 at 10:44 AM, Benjamin Root<ben.root@ou.edu> wrote:
I gave two counterexamples of why.
The examples you gave aren't counterexamples. See below...
I'm not interested in arguing over semantics. I've discovered an issue with how numpy deals with lists of objects that derive from ndarray, and am concerned about the implications for classes that extend ndarray.
On Wed, May 19, 2010 at 7:06 PM, Darren Dale<dsdale24@gmail.com> wrote:
On Wed, May 19, 2010 at 4:19 PM,<josef.pktd@gmail.com> wrote:
On Wed, May 19, 2010 at 4:08 PM, Darren Dale<dsdale24@gmail.com> wrote:
I have a question about creation of numpy arrays from a list of objects, which bears on the Quantities project and also on masked arrays:
>>> import quantities as pq >>> import numpy as np >>> a, b = 2*pq.m,1*pq.s >>> np.array([a, b]) >>> array([ 12., 1.])
Why doesn't that create an object array? Similarly:
Consider the use case of a person creating a 1-D numpy array:
np.array([12.0, 1.0]) array([ 12., 1.])
How is python supposed to tell the difference between
np.array([a, b]) and np.array([12.0, 1.0]) ?
It can't, and there are plenty of times when one wants to explicitly initialize a small numpy array with a few discrete variables.
>>> m = np.ma.array([1], mask=[True]) >>> m >>> masked_array(data = [--], mask = [ True], fill_value = 999999)
>>> np.array([m]) >>> array([[1]])
Again, this is expected behavior. Numpy saw an array of an array, therefore, it produced a 2-D array. Consider the following:
np.array([[12, 4, 1], [32, 51, 9]])
I, as a user, expect numpy to create a 2-D array (2 rows, 3 columns) from that array of arrays.
This has broader implications than just creating arrays, for example:
>>> np.sum([m, m]) >>> 2
>>> np.sum([a, b]) >>> 13.0
If you wanted sums from each object, there are some better (i.e., more clear) ways to go about it. If you have a predetermined number of numpy-compatible objects, say a, b, c, then you can explicitly call the sum for each one:
a_sum = np.sum(a) b_sum = np.sum(b) c_sum = np.sum(c)
Which I think communicates the programmer's intention better than (for a numpy array, x, composed of a, b, c):
object_sums = np.sum(x) #<--- As a numpy user, I would expect a scalar out of this, not an array
If you have an arbitrary number of objects (which is what I suspect you have), then one could easily produce an array of sums (for a list, x, of numpy-compatible objects) like so:
object_sums = [np.sum(anObject) for anObject in x]
Performance-wise, it should be no more or less efficient than having numpy somehow produce an array of sums from a single call to sum. Readability-wise, it makes more sense because when you are treating objects separately, a *list* of them is more intuitive than a numpy.array, which is more-or-less treated as a single mathematical entity.
I hope that addresses your concerns.
I appreciate the response, but you are arguing that it is not a problem, and I'm certain that it is. It may not be numpy
It may not be numpy's problem, I can accept that. But it is definitely a problem for quantities. I'm trying to determine just how big a problem it is. I had hoped that one day quantities might become a part of numpy or scipy, but this appears to be a fundamental issue and it makes me doubt that inclusion would be appropriate.
Thank you for the suggestion about calling the sum method instead of numpy's function. That is a reasonable workaround.
Darren _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi, np.array is an array creating function that numpy.array takes a array_like input and it *will* try to convert that input into an array. (This also occurs when you give np.array a masked array as an input.) This a 'feature' especially when you don't use the dtype argument and applies to any numpy function that takes array_like inputs. I do not quantities, but you either have to get the user to use the appropriate quantities functions or let it remain 'user beware' when they do not use the appropriate functions. In the longer term you have to get numpy to 'do the right thing' with quantities objects. Bruce
On Thu, May 20, 2010 at 12:07 PM, Bruce Southey <bsouthey@gmail.com> wrote:
np.array is an array creating function that numpy.array takes a array_like input and it *will* try to convert that input into an array. (This also occurs when you give np.array a masked array as an input.) This a 'feature' especially when you don't use the dtype argument and applies to any numpy function that takes array_like inputs.
Ok. I can accept that.
I do not quantities, but you either have to get the user to use the appropriate quantities functions or let it remain 'user beware' when they do not use the appropriate functions. In the longer term you have to get numpy to 'do the right thing' with quantities objects.
I have done a bit of development on numpy to try to extend the __array_wrap__ mechanism so quantities could tell numpy how to do the right thing in many situations. That has been largely successful, but this issue we are discussing is demonstrating some unanticipated limitations. You may be right that this is a "user-beware" situation, since in this case there appears to be no way for an ndarray subclass to step in and influence what numpy will do with a list of those instances. Darren
participants (5)
-
Benjamin Root
-
Bruce Southey
-
Darren Dale
-
josef.pktd@gmail.com
-
Ryan May