Not sure why this is filling my sys memory

Jonathan Gardner jgardner at
Sun Feb 21 06:34:41 CET 2010

On Sat, Feb 20, 2010 at 5:53 PM, Vincent Davis <vincent at> wrote:
>> On Sat, Feb 20, 2010 at 6:44 PM, Jonathan Gardner <jgardner at> wrote:
>> With this kind of data set, you should start looking at BDBs or
>> PostgreSQL to hold your data. While processing files this large is
>> possible, it isn't easy. Your time is better spent letting the DB
>> figure out how to arrange your data for you.
> I really do need all of it in at time, It is dna microarray data. Sure there are 230,00 rows but only 4 columns of small numbers. Would it help to make them float() ? I need to at some point. I know in numpy there is a way to set the type for the whole array "astype()" I think.
> What I don't get is that it show the size of the dict with all the data to have only 6424 bytes. What is using up all the memory?

Look into getting PostgreSQL to organize the data for you. It's much
easier to do processing properly with a database handle than a file
handle. You may also discover that writing functions in Python inside
of PostgreSQL can scale very well for whatever data needs you have.

Jonathan Gardner
jgardner at

More information about the Python-list mailing list