[Numpy-discussion] reading *big* inhomogenous text matrices *fast*?
Daniel Lenski
dlenski at gmail.com
Thu Aug 14 00:32:45 EDT 2008
On Wed, 13 Aug 2008 21:42:51 -0500, Robert Kern wrote:
> Here is the appropriate snippet in Objects/listobject.c:
>
> /* This over-allocates proportional to the list size, making
> room
> * for additional growth. The over-allocation is mild, but is *
> enough to give linear-time amortized behavior over a long *
> sequence of appends() in the presence of a poorly-performing *
> system realloc().
> * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88,
> ... */
> new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6) +
> newsize;
>
> Raymond Hettinger had a good talk at PyCon this year about the details
> of the Python containers. Here are the slides from the EuroPython
> version (I assume).
>
> http://www.pycon.it/static/pycon2/slides/containers.ppt
>
Thanks! Looks like the only caveat is that the whole thing may slow down
if the reallocation operation itself is very inefficient. Which probably
isn't the case with a modern Linux distro and recent libc. I'm thinking
whatever went wrong had to be my fault :-)
> Primarily, it's the fact that we have views of arrays that might be
> floating around that prevents us from reallocating as a matter of
> course. Now, we do have a .resize() method which will explicitly
> reallocate the array, but it will only work if you don't have any views
> on the array floating around. During your file reading, this is probably
> valid, so you may want to give it a try using a similar reallocation
> strategy as lists. I'd be interested in seeing some benchmarks comparing
> this strategy with the others.
That will be the next thing for me to try if my current approach becomes
too memory-inefficient. Good idea!
Dan
More information about the NumPy-Discussion
mailing list