Hi all,
how can I obtain the multiplicity of an entry in a list
a = ['abc','def','abc','ghij']
The multiplicity of 'abc' is 2. 'def' is 1. 'ghij' is 1.
Nils
On 10/26/2009 4:04 AM, Nils Wagner wrote:
how can I obtain the multiplicity of an entry in a list a = ['abc','def','abc','ghij']
That's a Python question, not a NumPy question. So comp.lang.python would be a better forum.
But here's a simplest solution::
a = ['abc','def','abc','ghij'] for item in set(a): print item, a.count(item)
This is horribly inefficient of course. If you have a big list, if would be *much* better to use defaultdict:
from collections import defaultdict myct = defaultdict(int) for item in a: myct[item] += 1 print myct.items()
fwiw, Alan Isaac
Alan G Isaac wrote:
On 10/26/2009 4:04 AM, Nils Wagner wrote:
how can I obtain the multiplicity of an entry in a list a = ['abc','def','abc','ghij']
That's a Python question, not a NumPy question.
but we can make it a numpy question!
In [15]: a = np.array(['abc','def','abc','ghij'])
In [16]: a Out[16]: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4')
In [17]: for item in set(a): print item, (a == item).sum()
abc 2 ghij 1 def 1
I'll leave pro=filing to the OP.
-Chris
On Mon, Oct 26, 2009 at 2:12 PM, Christopher Barker Chris.Barker@noaa.gov wrote:
Alan G Isaac wrote:
On 10/26/2009 4:04 AM, Nils Wagner wrote:
how can I obtain the multiplicity of an entry in a list a = ['abc','def','abc','ghij']
That's a Python question, not a NumPy question.
but we can make it a numpy question!
In [15]: a = np.array(['abc','def','abc','ghij'])
In [16]: a Out[16]: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4')
In [17]: for item in set(a): print item, (a == item).sum()
It's *very* slow, when there are a large number of items. numpy creates the full boolean array for each item.
see also http://projects.scipy.org/scipy/ticket/905
Josef
abc 2 ghij 1 def 1
I'll leave pro=filing to the OP.
-Chris
-- Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
In principle you could use:
np.equal(a,a).sum(0)
but, for unknown reason, np.equal operates only on "normal" arrays. maybe you can transform the array to arrays of numbers, for example by hash.
Nadav
-----הודעה מקורית----- מאת: numpy-discussion-bounces@scipy.org בשם josef.pktd@gmail.com נשלח: ב 26-אוקטובר-09 20:26 אל: Discussion of Numerical Python נושא: Re: [Numpy-discussion] Multiplicity of an entry
On Mon, Oct 26, 2009 at 2:12 PM, Christopher Barker Chris.Barker@noaa.gov wrote:
Alan G Isaac wrote:
On 10/26/2009 4:04 AM, Nils Wagner wrote:
how can I obtain the multiplicity of an entry in a list a = ['abc','def','abc','ghij']
That's a Python question, not a NumPy question.
but we can make it a numpy question!
In [15]: a = np.array(['abc','def','abc','ghij'])
In [16]: a Out[16]: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4')
In [17]: for item in set(a): print item, (a == item).sum()
It's *very* slow, when there are a large number of items. numpy creates the full boolean array for each item.
see also http://projects.scipy.org/scipy/ticket/905
Josef
abc 2 ghij 1 def 1
I'll leave pro=filing to the OP.
-Chris
-- Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Nadav Horesh wrote:
np.equal(a,a).sum(0)
but, for unknown reason, np.equal operates only on "normal" arrays.
true:
In [25]: a Out[25]: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4')
In [27]: np.equal(a,a) Out[27]: NotImplemented
however:
In [28]: a == a Out[28]: array([ True, True, True, True], dtype=bool)
don't they use the same code? or is "==" reverting to plain old generic python sequence comparison, which would partly explain why it is so slow.
maybe you can transform the array to arrays of numbers, for example by hash.
or even easier:
In [32]: a2 = a.view(dtype=np.int32)
In [33]: a2 Out[33]: array([1633837824, 1684366848, 1633837824, 1734895978])
In [34]: np.equal(a2, a2[0]) Out[34]: array([ True, False, True, False], dtype=bool)
though that only works if your strings are a handy length like 4 bytes...
-Chris
Christopher Barker wrote:
Nadav Horesh wrote:
np.equal(a,a).sum(0)
but, for unknown reason, np.equal operates only on "normal" arrays.
true:
In [25]: a Out[25]: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4')
In [27]: np.equal(a,a) Out[27]: NotImplemented
however:
In [28]: a == a Out[28]: array([ True, True, True, True], dtype=bool)
don't they use the same code? or is "==" reverting to plain old generic python sequence comparison, which would partly explain why it is so slow.
It looks as if "a == a" (that is array_richcompare) is triggering special case code for strings, so it is fast. However, IMHO np.equal should be made to work as well. Can you file a bug and assign it to me (I'm dealing with a number of other string-related things, so I might as well take this too).
Mike
On Oct 27, 2009, at 2:31 PM, Michael Droettboom wrote:
Christopher Barker wrote:
Nadav Horesh wrote:
np.equal(a,a).sum(0)
but, for unknown reason, np.equal operates only on "normal" arrays.
true:
In [25]: a Out[25]: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4')
In [27]: np.equal(a,a) Out[27]: NotImplemented
however:
In [28]: a == a Out[28]: array([ True, True, True, True], dtype=bool)
don't they use the same code? or is "==" reverting to plain old generic python sequence comparison, which would partly explain why it is so slow.
It looks as if "a == a" (that is array_richcompare) is triggering special case code for strings, so it is fast. However, IMHO np.equal should be made to work as well. Can you file a bug and assign it to me (I'm dealing with a number of other string-related things, so I might as well take this too).
The array_richcompare special-cased strings not for speed but for actual functionality.
Making np.equal work with strings requires changes to the ufunc code itself which was never written to work with "variable-length" data- types (like strings, unicode, and records). There are several things that would have to be fixed. Some of the changes we made to allow for date-time data-types also made it possible to support variable-length strings, but this is non-trivial to implement. It's certainly possible, but I would want to look at any changes you make before committing them to make sure all the issues are being understood.
Thanks,
-Travis
-- Travis Oliphant Enthought Inc. 1-512-536-1057 http://www.enthought.com oliphant@enthought.com
Travis Oliphant wrote:
On Oct 27, 2009, at 2:31 PM, Michael Droettboom wrote:
Christopher Barker wrote:
Nadav Horesh wrote:
np.equal(a,a).sum(0)
but, for unknown reason, np.equal operates only on "normal" arrays.
true:
In [25]: a Out[25]: array(['abc', 'def', 'abc', 'ghij'], dtype='|S4')
In [27]: np.equal(a,a) Out[27]: NotImplemented
however:
In [28]: a == a Out[28]: array([ True, True, True, True], dtype=bool)
don't they use the same code? or is "==" reverting to plain old generic python sequence comparison, which would partly explain why it is so slow.
It looks as if "a == a" (that is array_richcompare) is triggering special case code for strings, so it is fast. However, IMHO np.equal should be made to work as well. Can you file a bug and assign it to me (I'm dealing with a number of other string-related things, so I might as well take this too).
The array_richcompare special-cased strings not for speed but for actual functionality.
Making np.equal work with strings requires changes to the ufunc code itself which was never written to work with "variable-length" data- types (like strings, unicode, and records). There are several things that would have to be fixed. Some of the changes we made to allow for date-time data-types also made it possible to support variable-length strings, but this is non-trivial to implement. It's certainly possible, but I would want to look at any changes you make before committing them to make sure all the issues are being understood.
Yeah -- I'm realizing this is a bigger project than I initially suspected. I'll keep you posted if I find the time to do this right.
Mike