[Numpy-discussion] reading *big* inhomogenous text matrices *fast*?

Daniel Lenski dlenski at gmail.com
Thu Aug 14 00:32:45 EDT 2008


On Wed, 13 Aug 2008 21:42:51 -0500, Robert Kern wrote:

> Here is the appropriate snippet in Objects/listobject.c:
> 
>         /* This over-allocates proportional to the list size, making
>         room
>          * for additional growth.  The over-allocation is mild, but is *
>          enough to give linear-time amortized behavior over a long *
>          sequence of appends() in the presence of a poorly-performing *
>          system realloc().
>          * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88,
>          ... */
>         new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6) +
>         newsize;
> 
> Raymond Hettinger had a good talk at PyCon this year about the details
> of the Python containers. Here are the slides from the EuroPython
> version (I assume).
> 
>   http://www.pycon.it/static/pycon2/slides/containers.ppt
> 

Thanks!  Looks like the only caveat is that the whole thing may slow down 
if the reallocation operation itself is very inefficient.  Which probably 
isn't the case with a modern Linux distro and recent libc.  I'm thinking 
whatever went wrong had to be my fault :-)

> Primarily, it's the fact that we have views of arrays that might be
> floating around that prevents us from reallocating as a matter of
> course. Now, we do have a .resize() method which will explicitly
> reallocate the array, but it will only work if you don't have any views
> on the array floating around. During your file reading, this is probably
> valid, so you may want to give it a try using a similar reallocation
> strategy as lists. I'd be interested in seeing some benchmarks comparing
> this strategy with the others.

That will be the next thing for me to try if my current approach becomes 
too memory-inefficient.  Good idea!

Dan




More information about the NumPy-Discussion mailing list