[Numpy-discussion] memory-efficient loadtxt
chris.barker at noaa.gov
Wed Oct 3 12:22:35 EDT 2012
On Wed, Oct 3, 2012 at 9:05 AM, Paul Anton Letnes
<paul.anton.letnes at gmail.com> wrote:
>> I'm not sure the problem you are trying to solve -- accumulating in a
>> list is pretty efficient anyway -- not a whole lot overhead.
> Oh, there's significant overhead, since we're not talking of a list - we're talking of a list-of-lists.
hmm, a list of nupy scalers (custom dtype) would be a better option,
though maybe not all that much better -- still an extra pointer and
pyton object for each row.
> I see your point - but if you're to return a single array, and the file is close to the total system memory, you've still got a factor of 2 issue when shuffling the binary data from the accumulator into the result array. That is, unless I'm missong something here?
Indeed, I think that's how my current accumulator works -- the
__array__() method returns a copy of the data buffer, so that you
won't accidentally re-allocate it under the hood later and screw up
the returned version.
But it is indeed accumulating in a numpy array, so it should be pretty
possible, maybe even easy to turn it into a regular array without a
data copy. You'd just have to destroy the original somehow (or mark it
as never-resize) so you wouldn't have the clash. messing wwith the
OWNDATA flags might take care of that.
But it seems Wes has a better solution.
One other note, though -- if you have arrays that are that close to
max system memory, you are very likely to have other trouble anyway --
numpy does make a lot of copies!
Christopher Barker, Ph.D.
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion