[Numpy-discussion] can I mapping a np.darray class with a text file instead of reading the file in to mem?

kee chen keekychen.shared at gmail.com
Sat Oct 2 11:23:57 EDT 2010


Dear All,

I have memory problem in reading data from text file to a np.darray. It is
because I have low mem on my pc and the data is too big.
Te data is stored as 3 cols text and may have 10000000 records look like
this

0.64984279 0.587856227 0.827348652
0.33463377 0.210916859 0.608797746
0.230265156 0.390278562 0.186308355
0.431187207 0.127007937 0.949673389
...

10000000 LINES OMITTED HERE
...
0.150027782 0.800999655 0.551508963
0.255163742 0.785462049 0.015694154


After googled, I found 3 ways may solve this problem:
    1.hardware upgrade(upgrade memory, upgrade arch to x64 ..... )
    2. filter the data before processing
    3. using pytable

However , I am trying to think another possibility - the mem-time trade-off.

Can I design a class inherit from the np.darray then make it mapping with
the text file?
It may works in such a way, inside of this class only maintain a row object
and  total row ID a.k.a the rows of the file. the row mapping may look like
this:

an row object   <--- bind--->   row ID in text file  <--- bind---> function
row_eader()

Wen np function be applied on this object, the actual date is from function
row_eader(actual row ID).

I have no idea how to code it then may I get support here to design such a
class? Thanks!


Rgs,

KC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20101002/c0694c44/attachment.html>


More information about the NumPy-Discussion mailing list