[Numpy-discussion] Need faster equivalent to digitize

Peter Shinners pete at shinners.org
Thu Apr 15 02:45:31 EDT 2010


On 04/14/2010 11:34 PM, Nadav Horesh wrote:
> import numpy as N
> N.repeat(N.arange(len(a)), a)
>
>    Nadav
>
> -----Original Message-----
> From: numpy-discussion-bounces at scipy.org on behalf of Peter Shinners
> Sent: Thu 15-Apr-10 08:30
> To: Discussion of Numerical Python
> Subject: [Numpy-discussion] Need faster equivalent to digitize
>
> I am using digitize to create a list of indices. This is giving me
> exactly what I want, but it's terribly slow. Digitize is obviously not
> the tool I want for this case, but what numpy alternative do I have?
>
> I have an array like np.array((4, 3, 3)). I need to create an index
> array with each index repeated by the its value: np.array((0, 0, 0, 0,
> 1, 1, 1, 2, 2, 2)).
>
>   >>>  a = np.array((4, 3, 3))
>   >>>  b = np.arange(np.sum(a))
>   >>>  c = np.digitize(b, a)
>   >>>  print c
> [0 0 0 0 1 1 1 2 2 2]
>
> On an array where a.size==65536 and sum(a)==65536 this is taking over 6
> seconds to compute. As a comparison, using a Python list solution runs
> in 0.08 seconds. That is plenty fast, but I would guess there is a
> faster Numpy solution that does not require a dynamically growing
> container of PyObjects ?
>
>   >>>  a = np.array((4, 3, 3))
>   >>>  c = []
>   >>>  for i, v in enumerate(a):
> ...     c.extend([i] * v)
>    

Excellent. The Numpy version is a bit faster, and I prefer having an 
ndarray as the end result.



More information about the NumPy-Discussion mailing list