It's also extremely surprising to me that reading a 1.8MB file is causing a memory error. That's actually not a particularly large file, and if it is causing a memory error, there must be something wrong with the your Python configuration or build.<div>
<br></div><div>Best,</div><div>Lucas<br><br><div class="gmail_quote">On Tue, May 17, 2011 at 12:26 PM, Lucas Wiman <span dir="ltr"><<a href="mailto:lucas.wiman@gmail.com">lucas.wiman@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><br><br><div class="gmail_quote">On Tue, May 17, 2011 at 10:56 AM, <span dir="ltr"><<a href="mailto:baypiggies-request@python.org" target="_blank">baypiggies-request@python.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
I wish to read a large data file (file size is around 1.8 MB) and manipulate<br>
the data in this file. Just reading and writing the first 500 lines of this<br>
file is causing a problem. I wrote:<br>
<br>
fin = open('gene-GS00471-DNA_B01_1101_37-ASM.tsv')<br>
count = 0<br>
for i in fin.readlines():<br>
print i<br>
count += 1<br>
if count >= 500:<br>
break<br>
<br>
and got this error msg:<br>
<br>
Traceback (most recent call last):<br>
File<br>
"H:\genome_4_omics_study\GS000003696-DID\GS00471-DNA_B01_1101_37-ASM\GS00471-DNA_B01\ASM\gene-GS00471-DNA_B01_1101_37-ASM.tsv\test.py",<br>
line 3, in <module><br>
for i in fin.readlines():<br>
MemoryError<br></blockquote><div><br></div><div>If your data is actually a tsv (tab-separated value format), you should be using the csv module for actually iterating over lines in it. Just set the delimiter to '\t' and look at the docs at <a href="http://docs.python.org/library/csv.html" target="_blank">http://docs.python.org/library/csv.html</a></div>
<div><br></div><div>You should also generally use the "with" syntax when dealing with files since it handles closing the file object for you (probably not an issue when you're just reading from a single file, but best practices nonetheless). Here's how I would deal with your situation:</div>
<div><br></div><div><div><font face="'courier new', monospace">import csv</font></div><div><font face="'courier new', monospace"><br></font></div><div><font face="'courier new', monospace">with open('gene-GS00471-DNA_B01_1101_37-ASM.tsv', 'r') as f:</font></div>
<div><font face="'courier new', monospace"> r = csv.reader(f, delimiter='\t')</font></div><div><font face="'courier new', monospace"> for row in r:</font></div>
<div><font face="'courier new', monospace"> # row is a list of strings that correspond to the columns in your file</font></div><div><font face="'courier new', monospace"> do_stuff_with_the_row(row)</font></div>
<div><font face="'courier new', monospace"># your file object f is now closed</font></div></div><div><font face="'courier new', monospace"><br></font></div>
<div><font face="arial, helvetica, sans-serif">Best wishes,</font></div><div><font face="arial, helvetica, sans-serif">Lucas Wiman</font></div></div>
</blockquote></div><br></div>