Not sure why this is filling my sys memory

Vincent Davis vincent at vincentdavis.net
Sat Feb 20 20:53:22 EST 2010


On Sat, Feb 20, 2010 at 6:44 PM, Jonathan Gardner <
> jgardner at jonathangardner.net> wrote:

With this kind of data set, you should start looking at BDBs or

PostgreSQL to hold your data. While processing files this large is

possible, it isn't easy. Your time is better spent letting the DB

figure out how to arrange your data for you.


I really do need all of it in at time, It is dna microarray data. Sure there
are 230,00 rows but only 4 columns of small numbers. Would it help to make
them float() ? I need to at some point. I know in numpy there is a way to
set the type for the whole array "astype()" I think.
What I don't get is that it show the size of the dict with all the data to
have only 6424 bytes. What is using up all the memory?

  *Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>


On Sat, Feb 20, 2010 at 6:44 PM, Jonathan Gardner <
jgardner at jonathangardner.net> wrote:

> On Sat, Feb 20, 2010 at 5:07 PM, Vincent Davis <vincent at vincentdavis.net>
> wrote:
> >> Code is below, The files are about 5mb and 230,000 rows. When I have 43
> >> files of them and when I get to the 35th (reading it in) my system gets
> so
> >> slow that it is nearly functionless. I am on a mac and activity monitor
> >> shows that python is using 2.99GB of memory (of 4GB). (python 2.6
> 64bit).
> >> The getsizeof() returns 6424 bytes for the alldata . So I am not sure
> what
> >> is happening.
>
> With this kind of data set, you should start looking at BDBs or
> PostgreSQL to hold your data. While processing files this large is
> possible, it isn't easy. Your time is better spent letting the DB
> figure out how to arrange your data for you.
>
> --
> Jonathan Gardner
> jgardner at jonathangardner.net
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100220/973679ea/attachment.html>


More information about the Python-list mailing list