[Numpy-discussion] Need faster equivalent to digitize
Peter Shinners
pete at shinners.org
Thu Apr 15 02:45:31 EDT 2010
On 04/14/2010 11:34 PM, Nadav Horesh wrote:
> import numpy as N
> N.repeat(N.arange(len(a)), a)
>
> Nadav
>
> -----Original Message-----
> From: numpy-discussion-bounces at scipy.org on behalf of Peter Shinners
> Sent: Thu 15-Apr-10 08:30
> To: Discussion of Numerical Python
> Subject: [Numpy-discussion] Need faster equivalent to digitize
>
> I am using digitize to create a list of indices. This is giving me
> exactly what I want, but it's terribly slow. Digitize is obviously not
> the tool I want for this case, but what numpy alternative do I have?
>
> I have an array like np.array((4, 3, 3)). I need to create an index
> array with each index repeated by the its value: np.array((0, 0, 0, 0,
> 1, 1, 1, 2, 2, 2)).
>
> >>> a = np.array((4, 3, 3))
> >>> b = np.arange(np.sum(a))
> >>> c = np.digitize(b, a)
> >>> print c
> [0 0 0 0 1 1 1 2 2 2]
>
> On an array where a.size==65536 and sum(a)==65536 this is taking over 6
> seconds to compute. As a comparison, using a Python list solution runs
> in 0.08 seconds. That is plenty fast, but I would guess there is a
> faster Numpy solution that does not require a dynamically growing
> container of PyObjects ?
>
> >>> a = np.array((4, 3, 3))
> >>> c = []
> >>> for i, v in enumerate(a):
> ... c.extend([i] * v)
>
Excellent. The Numpy version is a bit faster, and I prefer having an
ndarray as the end result.
More information about the NumPy-Discussion
mailing list