Memory efficient tuple storage

Fri Mar 13 13:13:36 EDT 2009

On Fri, Mar 13, 2009 at 11:33 AM, Kurt Smith <kwmsmith at gmail.com> wrote:
[snip OP]
>
> Assuming your data is in a plaintext file something like
> 'genomedata.txt' below, the following will load it into a numpy array
> with a customized dtype.  You can access the different fields by name
> ('chromo', 'position', and 'dpoint' -- change to your liking).  Don't
> know if this works or not; might give it a try.

To clarify -- I don't know if this will work for your particular
problem, but I do know that it will read in the array correctly and
cut down on memory usage in the final array size.

Specifically, if you use a dtype with 'S50', 'i4' and 'f8' (see the
numpy dtype docs) -- that's 50 bytes for your chromosome string, 4
bytes for the position and 8 bytes for the data point -- each entry
will use just 50 + 4 + 8 bytes, and the numpy array will have just
enough memory allocated for all of these records.  The datatypes
stored in the array will be a char array for the string, a C int and a
C double; it won't use the corresponding python datatypes which have a
bunch of other memory usage associated with them.

Hope this helps,

Kurt