<span></span><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"></blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
On Sat, Feb 20, 2010 at 6:44 PM, Jonathan Gardner <span dir="ltr"><<a href="mailto:jgardner@jonathangardner.net">jgardner@jonathangardner.net</a>></span> wrote:</blockquote><blockquote class="webkit-indent-blockquote gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
With this kind of data set, you should start looking at BDBs or</blockquote><blockquote class="webkit-indent-blockquote gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
PostgreSQL to hold your data. While processing files this large is</blockquote><blockquote class="webkit-indent-blockquote gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
possible, it isn't easy. Your time is better spent letting the DB</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
figure out how to arrange your data for you.</blockquote><div><br></div><div>I really do need all of it in at time, It is dna microarray data. Sure there are 230,00 rows but only 4 columns of small numbers. Would it help to make them float() ? I need to at some point. I know in numpy there is a way to set the type for the whole array "astype()" I think.</div>
<div>What I don't get is that it show the size of the dict with all the data to have only <span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; ">6424 bytes. What is using up all the memory?</span></div>
<div><font class="Apple-style-span" color="#888888"><br></font><div name="mailplane_signature"> <table><tbody><tr><td width="80">
<img src="http://www.gravatar.com/avatar/226e40fdc55d4597a46279296a616384.png">
</td><td width="10"></td><td width="127" align="center">
<div style="padding-right: 5px; padding-left: 5px;
font-size: 11px; padding-bottom: 5px; color: #666666;
padding-top: 5px">
<p><strong>Vincent Davis<br>
720-301-3003
</strong><br>
<a href="mailto:vincent@vincentdavis.net">vincent@vincentdavis.net</a> </p>
<div style="font-size: 10px">
<a href="http://vincentdavis.net">my blog</a> |
<a href="http://www.linkedin.com/in/vincentdavis">LinkedIn</a></div></div></td></tr><tr></tr></tbody></table></div><br><br><div class="gmail_quote">On Sat, Feb 20, 2010 at 6:44 PM, Jonathan Gardner <span dir="ltr"><<a href="mailto:jgardner@jonathangardner.net">jgardner@jonathangardner.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">On Sat, Feb 20, 2010 at 5:07 PM, Vincent Davis <<a href="mailto:vincent@vincentdavis.net">vincent@vincentdavis.net</a>> wrote:<br>
>> Code is below, The files are about 5mb and 230,000 rows. When I have 43<br>
>> files of them and when I get to the 35th (reading it in) my system gets so<br>
>> slow that it is nearly functionless. I am on a mac and activity monitor<br>
>> shows that python is using 2.99GB of memory (of 4GB). (python 2.6 64bit).<br>
>> The getsizeof() returns 6424 bytes for the alldata . So I am not sure what<br>
>> is happening.<br>
<br>
</div>With this kind of data set, you should start looking at BDBs or<br>
PostgreSQL to hold your data. While processing files this large is<br>
possible, it isn't easy. Your time is better spent letting the DB<br>
figure out how to arrange your data for you.<br>
<font color="#888888"><br>
--<br>
Jonathan Gardner<br>
<a href="mailto:jgardner@jonathangardner.net">jgardner@jonathangardner.net</a><br>
</font></blockquote></div><br></div>