[Numpy-discussion] reading *big* inhomogenous text matrices *fast*?

Zachary Pincus zachary.pincus at yale.edu
Wed Aug 13 22:11:07 EDT 2008


> This is similar to what I tried originally!  Unfortunately, repeatedly
> appending to a list seems to be very slow... I guess Python keeps
> reallocating and copying the list as it grows.  (It would be nice to  
> be
> able to tune the increments by which the list size increases.)

Robert's right, as ever -- repeated appending to a list is an  
*extremely* common operation, which you see often in idiomatic python.  
The implementation of list.append should be very fast, and smart about  
pre-allocating as needed.

Try profiling the code just to make sure that it is the list append  
that's slow, and not something else happening on that line, e.g..

> I hope this recipe may prove useful to others.  It would be nice if  
> NumPy
> had a built-in facility for arrays that intelligently expend their
> allocation as they grow.

It appears to be the general consensus on this mailing list that the  
best solution when an expandable array is required is to append to a  
python list, and then once you've built it up completely, convert it  
to an array. So I'm at least surprised that this is turning out to be  
so slow for you... But if the profiler says that's where the trouble  
is, then so it is...


>> Also you could see if:
>>   cells[type].append(numpy.array([index, property]+nodes, dtype=int))
>>
>> is faster than what's above... it's worth testing.
>
> Repeatedly concatenating arrays with numpy.append or  
> numpy.concatenate is
> also quite slow, unfortunately. :-(

Actually, my suggestion was to compare building up a list-of-lists and  
then converting that to a 2d array versus building up a list-of- 
arrays, and then converting that to a 2d array... one might wind up  
being faster or more memory-efficient than the other...

Zach







More information about the NumPy-Discussion mailing list