What's the most straightforward way to count, say, the number of 1s or Trues in the array? Or the number of any integer?
I was surprised to discover recently that there isn't a count() method as there is for Python lists. Sorry if this has been discussed already, but I'm wondering if there's a reason for its absence.
I came across a thread in March:
http://aspn.activestate.com/ASPN/Mail/Message/numpydiscussion/3066460
that talked a bit about this in terms of speed, but what about just the convenience of having a count() method?
Looks like masked arrays have a count method, don't know much about them though.
Also, I understand the inaccuracies when converting between binary and decimal floating point representations, and therefore making counting of a specific float value in an array somewhat undefined, yet it seems to work in Python lists:
1.1
1.1000000000000001
a=[1.1, 1.1, 1.2] a
[1.1000000000000001, 1.1000000000000001, 1.2]
a.count(1.1)
2
a.count(1.1000000000000001)
2
a.count(1.2)
1
Comments?
Martin
On 9/7/06, Martin Spacek scipy@mspacek.mm.st wrote:
What's the most straightforward way to count, say, the number of 1s or Trues in the array? Or the number of any integer?
I was surprised to discover recently that there isn't a count() method
as there is for Python lists. Sorry if this has been discussed already, but I'm wondering if there's a reason for its absence.
I don't know about count, but you can gin up something like this
In [78]: a = ran.randint(0,2, size=(10,))
In [79]: a Out[79]: array([0, 1, 0, 1, 1, 0, 0, 1, 1, 1])
In [80]: b = sort(a)
In [81]: b.searchsorted(1, side='right')  b.searchsorted(1, side='left') Out[81]: 6
Which counts the number of ones in a.
I came across a thread in March:
http://aspn.activestate.com/ASPN/Mail/Message/numpydiscussion/3066460
that talked a bit about this in terms of speed, but what about just the convenience of having a count() method?
Looks like masked arrays have a count method, don't know much about them though.
Also, I understand the inaccuracies when converting between binary and decimal floating point representations, and therefore making counting of a specific float value in an array somewhat undefined, yet it seems to work in Python lists:
1.1
1.1000000000000001
a=[1.1, 1.1, 1.2] a
[1.1000000000000001, 1.1000000000000001, 1.2]
a.count(1.1)
2
a.count(1.1000000000000001)
2
a.count(1.2)
1
Well, 1.1 == 1.1000000000000001 and that doesn't change. You probably need to use different precisions to run into problems.
Chuck
Charles R Harris charlesr.harris@gmail.com [20060907 15:04]:
I don't know about count, but you can gin up something like this
In [78]: a = ran.randint(0,2, size=(10,))
In [79]: a Out[79]: array([0, 1, 0, 1, 1, 0, 0, 1, 1, 1])
This exposed inconsistent randint() behavior between SciPy and the Python random module. The Python randint includes the upper endpoint. The SciPy version excludes it.
from random import randint for i in range(50):
... print randint(0,2), ... 0 1 1 1 1 1 0 0 2 1 1 0 2 2 1 2 0 2 0 0 0 2 2 2 2 2 2 2 1 2 2 0 0 1 2 2 0 1 1 0 2 0 1 2 1 2 2 2 1 1
from scipy import *
print random.randint(0,2, size=(100,))
[0 1 1 1 1 0 1 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 0 1 0 0 0 1 1 1 1 0 1 1 1 0 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 0 0 0 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 0 0]
rex
rex wrote:
Charles R Harris charlesr.harris@gmail.com [20060907 15:04]:
I don't know about count, but you can gin up something like this
In [78]: a = ran.randint(0,2, size=(10,))
In [79]: a Out[79]: array([0, 1, 0, 1, 1, 0, 0, 1, 1, 1])
This exposed inconsistent randint() behavior between SciPy and the Python random module. The Python randint includes the upper endpoint. The SciPy version excludes it.
numpy.random.random_integers() includes the upper bound, if you like. numpy.random does not try to emulate the standard library's random module.
Robert Kern robert.kern@gmail.com [20060907 16:35]:
rex wrote:
Charles R Harris charlesr.harris@gmail.com [20060907 15:04]:
I don't know about count, but you can gin up something like this
In [78]: a = ran.randint(0,2, size=(10,))
In [79]: a Out[79]: array([0, 1, 0, 1, 1, 0, 0, 1, 1, 1])
This exposed inconsistent randint() behavior between SciPy and the Python random module. The Python randint includes the upper endpoint. The SciPy version excludes it.
numpy.random.random_integers() includes the upper bound, if you like. numpy.random does not try to emulate the standard library's random module.
I'm not in a position to argue the merits, but IMHO, when code that previously worked silently starts returning subtly bad results after importing numpy, there is a problem. What possible upside is there in having randint() behave one way in the random module and silently behave differently in numpy?
More generally, since numpy.random does not try to emulate the random module, how does one convert from code that uses the random module to numpy? Is randint() the only silent problem, or are there others? If so, how does one discover them? Are they documented anywhere?
I deeply appreciate the countless hours the core developers have contributed to numpy/scipy, but sometimes I think you are too close to the problems to fully appreciate the barriers to widespread adoption such silent "gotchas" present. If the code breaks, fine, you know there's a problem. When it runs, but returns wrong  but not obviously wrong  results, there's a serious problem that will deter a significant number of people from ever trying the product again.
Again, what is the upside of changing the behavior of the standard library's randint() without also changing the name?
rex
rex wrote:
Robert Kern robert.kern@gmail.com [20060907 16:35]:
rex wrote:
Charles R Harris charlesr.harris@gmail.com [20060907 15:04]:
I don't know about count, but you can gin up something like this
In [78]: a = ran.randint(0,2, size=(10,))
In [79]: a Out[79]: array([0, 1, 0, 1, 1, 0, 0, 1, 1, 1])
This exposed inconsistent randint() behavior between SciPy and the Python random module. The Python randint includes the upper endpoint. The SciPy version excludes it.
numpy.random.random_integers() includes the upper bound, if you like. numpy.random does not try to emulate the standard library's random module.
I'm not in a position to argue the merits, but IMHO, when code that previously worked silently starts returning subtly bad results after importing numpy, there is a problem. What possible upside is there in having randint() behave one way in the random module and silently behave differently in numpy?
I don't understand you. Importing numpy does not change the standard library's random module in any way. There is no silent difference in behavior. If you use numpy.random you get one set of behavior. If you use random, you get another. Pick the one you want. They're not interchangeable, and nothing suggests that they ought to be.
More generally, since numpy.random does not try to emulate the random module, how does one convert from code that uses the random module to numpy? Is randint() the only silent problem, or are there others? If so, how does one discover them? Are they documented anywhere?
The docstrings in that module are complete.
I deeply appreciate the countless hours the core developers have contributed to numpy/scipy, but sometimes I think you are too close to the problems to fully appreciate the barriers to widespread adoption such silent "gotchas" present. If the code breaks, fine, you know there's a problem. When it runs, but returns wrong  but not obviously wrong  results, there's a serious problem that will deter a significant number of people from ever trying the product again.
Again, what is the upside of changing the behavior of the standard library's randint() without also changing the name?
Again, numpy.random has nothing to do with the standard library module random. The names of the functions match those in the PRNG facilities that used to be in Numeric and scipy which numpy.random is replacing. Specifically, numpy.random.randint() derives its behavior from Numeric's RandomArray.randint().
Robert Kern robert.kern@gmail.com [20060908 06:51]:
rex wrote:
Robert Kern robert.kern@gmail.com [20060907 16:35]:
rex wrote:
This exposed inconsistent randint() behavior between SciPy and the Python random module. The Python randint includes the upper endpoint. The SciPy version excludes it.
I'm not in a position to argue the merits, but IMHO, when code that previously worked silently starts returning subtly bad results after importing numpy, there is a problem. What possible upside is there in having randint() behave one way in the random module and silently behave differently in numpy?
I don't understand you.
That's because I wasn't making any sense. :(
Importing numpy does not change the standard library's random module in any way. There is no silent difference in behavior. If you use numpy.random you get one set of behavior. If you use random, you get another. Pick the one you want. They're not interchangeable, and nothing suggests that they ought to be.
Of course you're right. I thought the name would be overwritten, and it isn't. Sorry for wasting your time. :(
Thanks,
rex
Martin Spacek wrote:
What's the most straightforward way to count, say, the number of 1s or Trues in the array? Or the number of any integer?
I was surprised to discover recently that there isn't a count() method as there is for Python lists. Sorry if this has been discussed already, but I'm wondering if there's a reason for its absence.
Mostly, it's simply easy enough to implement yourself. Not all oneliners should be methods on the array object.
(a == value).sum()
Of course, there are several different things you might do. You might want to have multiple values broadcast across the array. You might want to reduce the count along a given axis. You might want to use floating point comparison with a tolerance. Putting all of those options into one method reduces the convenience of that method. Putting a crippled implementation (i.e. just that oneliner) expands the alreadyenormous API of the ndarray object without much benefit.
On 9/7/06, Martin Spacek scipy@mspacek.mm.st wrote:
What's the most straightforward way to count, say, the number of 1s or Trues in the array? Or the number of any integer?
I was surprised to discover recently that there isn't a count() method as there is for Python lists. Sorry if this has been discussed already, but I'm wondering if there's a reason for its absence.
You don't really need count with ndarrays:
from numpy import * a = array([1,2,3,1,2,3,1,2]) (a==3).sum()
2
On 9/7/06, Martin Spacek scipy@mspacek.mm.st wrote:
What's the most straightforward way to count, say, the number of 1s or Trues in the array? Or the number of any integer?
I was surprised to discover recently that there isn't a count() method as there is for Python lists. Sorry if this has been discussed already, but I'm wondering if there's a reason for its absence.
You don't really need count with ndarrays:
from numpy import * a = array([1,2,3,1,2,3,1,2]) (a==3).sum()
2
participants (7)

Alexander Belopolsky

Charles R Harris

Martin Spacek

Martin Spacek

rex

Robert Kern

Sasha