
What's the most straightforward way to count, say, the number of 1s or Trues in the array? Or the number of any integer?
I was surprised to discover recently that there isn't a count() method as there is for Python lists. Sorry if this has been discussed already, but I'm wondering if there's a reason for its absence.
I came across a thread in March:
http://aspn.activestate.com/ASPN/Mail/Message/numpy-discussion/3066460
that talked a bit about this in terms of speed, but what about just the convenience of having a count() method?
Looks like masked arrays have a count method, don't know much about them though.
Also, I understand the inaccuracies when converting between binary and decimal floating point representations, and therefore making counting of a specific float value in an array somewhat undefined, yet it seems to work in Python lists:
1.1
1.1000000000000001
a=[1.1, 1.1, 1.2] a
[1.1000000000000001, 1.1000000000000001, 1.2]
a.count(1.1)
2
a.count(1.1000000000000001)
2
a.count(1.2)
1
Comments?
Martin

On 9/7/06, Martin Spacek scipy@mspacek.mm.st wrote:
What's the most straightforward way to count, say, the number of 1s or Trues in the array? Or the number of any integer?
I was surprised to discover recently that there isn't a count() method
as there is for Python lists. Sorry if this has been discussed already, but I'm wondering if there's a reason for its absence.
I don't know about count, but you can gin up something like this
In [78]: a = ran.randint(0,2, size=(10,))
In [79]: a Out[79]: array([0, 1, 0, 1, 1, 0, 0, 1, 1, 1])
In [80]: b = sort(a)
In [81]: b.searchsorted(1, side='right') - b.searchsorted(1, side='left') Out[81]: 6
Which counts the number of ones in a.
I came across a thread in March:
http://aspn.activestate.com/ASPN/Mail/Message/numpy-discussion/3066460
that talked a bit about this in terms of speed, but what about just the convenience of having a count() method?
Looks like masked arrays have a count method, don't know much about them though.
Also, I understand the inaccuracies when converting between binary and decimal floating point representations, and therefore making counting of a specific float value in an array somewhat undefined, yet it seems to work in Python lists:
1.1
1.1000000000000001
a=[1.1, 1.1, 1.2] a
[1.1000000000000001, 1.1000000000000001, 1.2]
a.count(1.1)
2
a.count(1.1000000000000001)
2
a.count(1.2)
1
Well, 1.1 == 1.1000000000000001 and that doesn't change. You probably need to use different precisions to run into problems.
Chuck

Charles R Harris charlesr.harris@gmail.com [2006-09-07 15:04]:
I don't know about count, but you can gin up something like this
In [78]: a = ran.randint(0,2, size=(10,))
In [79]: a Out[79]: array([0, 1, 0, 1, 1, 0, 0, 1, 1, 1])
This exposed inconsistent randint() behavior between SciPy and the Python random module. The Python randint includes the upper endpoint. The SciPy version excludes it.
from random import randint for i in range(50):
... print randint(0,2), ... 0 1 1 1 1 1 0 0 2 1 1 0 2 2 1 2 0 2 0 0 0 2 2 2 2 2 2 2 1 2 2 0 0 1 2 2 0 1 1 0 2 0 1 2 1 2 2 2 1 1
from scipy import *
print random.randint(0,2, size=(100,))
[0 1 1 1 1 0 1 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 0 1 0 0 0 1 1 1 1 0 1 1 1 0 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 0 0 0 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 0 0]
-rex

rex wrote:
Charles R Harris charlesr.harris@gmail.com [2006-09-07 15:04]:
I don't know about count, but you can gin up something like this
In [78]: a = ran.randint(0,2, size=(10,))
In [79]: a Out[79]: array([0, 1, 0, 1, 1, 0, 0, 1, 1, 1])
This exposed inconsistent randint() behavior between SciPy and the Python random module. The Python randint includes the upper endpoint. The SciPy version excludes it.
numpy.random.random_integers() includes the upper bound, if you like. numpy.random does not try to emulate the standard library's random module.

Robert Kern robert.kern@gmail.com [2006-09-07 16:35]:
rex wrote:
Charles R Harris charlesr.harris@gmail.com [2006-09-07 15:04]:
I don't know about count, but you can gin up something like this
In [78]: a = ran.randint(0,2, size=(10,))
In [79]: a Out[79]: array([0, 1, 0, 1, 1, 0, 0, 1, 1, 1])
This exposed inconsistent randint() behavior between SciPy and the Python random module. The Python randint includes the upper endpoint. The SciPy version excludes it.
numpy.random.random_integers() includes the upper bound, if you like. numpy.random does not try to emulate the standard library's random module.
I'm not in a position to argue the merits, but IMHO, when code that previously worked silently starts returning subtly bad results after importing numpy, there is a problem. What possible upside is there in having randint() behave one way in the random module and silently behave differently in numpy?
More generally, since numpy.random does not try to emulate the random module, how does one convert from code that uses the random module to numpy? Is randint() the only silent problem, or are there others? If so, how does one discover them? Are they documented anywhere?
I deeply appreciate the countless hours the core developers have contributed to numpy/scipy, but sometimes I think you are too close to the problems to fully appreciate the barriers to widespread adoption such silent "gotchas" present. If the code breaks, fine, you know there's a problem. When it runs, but returns wrong -- but not obviously wrong -- results, there's a serious problem that will deter a significant number of people from ever trying the product again.
Again, what is the upside of changing the behavior of the standard library's randint() without also changing the name?
-rex

rex wrote:
Robert Kern robert.kern@gmail.com [2006-09-07 16:35]:
rex wrote:
Charles R Harris charlesr.harris@gmail.com [2006-09-07 15:04]:
I don't know about count, but you can gin up something like this
In [78]: a = ran.randint(0,2, size=(10,))
In [79]: a Out[79]: array([0, 1, 0, 1, 1, 0, 0, 1, 1, 1])
This exposed inconsistent randint() behavior between SciPy and the Python random module. The Python randint includes the upper endpoint. The SciPy version excludes it.
numpy.random.random_integers() includes the upper bound, if you like. numpy.random does not try to emulate the standard library's random module.
I'm not in a position to argue the merits, but IMHO, when code that previously worked silently starts returning subtly bad results after importing numpy, there is a problem. What possible upside is there in having randint() behave one way in the random module and silently behave differently in numpy?
I don't understand you. Importing numpy does not change the standard library's random module in any way. There is no silent difference in behavior. If you use numpy.random you get one set of behavior. If you use random, you get another. Pick the one you want. They're not interchangeable, and nothing suggests that they ought to be.
More generally, since numpy.random does not try to emulate the random module, how does one convert from code that uses the random module to numpy? Is randint() the only silent problem, or are there others? If so, how does one discover them? Are they documented anywhere?
The docstrings in that module are complete.
I deeply appreciate the countless hours the core developers have contributed to numpy/scipy, but sometimes I think you are too close to the problems to fully appreciate the barriers to widespread adoption such silent "gotchas" present. If the code breaks, fine, you know there's a problem. When it runs, but returns wrong -- but not obviously wrong -- results, there's a serious problem that will deter a significant number of people from ever trying the product again.
Again, what is the upside of changing the behavior of the standard library's randint() without also changing the name?
Again, numpy.random has nothing to do with the standard library module random. The names of the functions match those in the PRNG facilities that used to be in Numeric and scipy which numpy.random is replacing. Specifically, numpy.random.randint() derives its behavior from Numeric's RandomArray.randint().

Robert Kern robert.kern@gmail.com [2006-09-08 06:51]:
rex wrote:
Robert Kern robert.kern@gmail.com [2006-09-07 16:35]:
rex wrote:
This exposed inconsistent randint() behavior between SciPy and the Python random module. The Python randint includes the upper endpoint. The SciPy version excludes it.
I'm not in a position to argue the merits, but IMHO, when code that previously worked silently starts returning subtly bad results after importing numpy, there is a problem. What possible upside is there in having randint() behave one way in the random module and silently behave differently in numpy?
I don't understand you.
That's because I wasn't making any sense. :(
Importing numpy does not change the standard library's random module in any way. There is no silent difference in behavior. If you use numpy.random you get one set of behavior. If you use random, you get another. Pick the one you want. They're not interchangeable, and nothing suggests that they ought to be.
Of course you're right. I thought the name would be overwritten, and it isn't. Sorry for wasting your time. :(
Thanks,
-rex

Martin Spacek wrote:
What's the most straightforward way to count, say, the number of 1s or Trues in the array? Or the number of any integer?
I was surprised to discover recently that there isn't a count() method as there is for Python lists. Sorry if this has been discussed already, but I'm wondering if there's a reason for its absence.
Mostly, it's simply easy enough to implement yourself. Not all one-liners should be methods on the array object.
(a == value).sum()
Of course, there are several different things you might do. You might want to have multiple values broadcast across the array. You might want to reduce the count along a given axis. You might want to use floating point comparison with a tolerance. Putting all of those options into one method reduces the convenience of that method. Putting a crippled implementation (i.e. just that one-liner) expands the already-enormous API of the ndarray object without much benefit.

On 9/7/06, Martin Spacek scipy@mspacek.mm.st wrote:
What's the most straightforward way to count, say, the number of 1s or Trues in the array? Or the number of any integer?
I was surprised to discover recently that there isn't a count() method as there is for Python lists. Sorry if this has been discussed already, but I'm wondering if there's a reason for its absence.
You don't really need count with ndarrays:
from numpy import * a = array([1,2,3,1,2,3,1,2]) (a==3).sum()
2

On 9/7/06, Martin Spacek scipy@mspacek.mm.st wrote:
What's the most straightforward way to count, say, the number of 1s or Trues in the array? Or the number of any integer?
I was surprised to discover recently that there isn't a count() method as there is for Python lists. Sorry if this has been discussed already, but I'm wondering if there's a reason for its absence.
You don't really need count with ndarrays:
from numpy import * a = array([1,2,3,1,2,3,1,2]) (a==3).sum()
2
participants (7)
-
Alexander Belopolsky
-
Charles R Harris
-
Martin Spacek
-
Martin Spacek
-
rex
-
Robert Kern
-
Sasha