[Tutor] reading very large files

Tue May 17 19:49:13 CEST 2011

On Tue, 17 May 2011, 19:20:42 CEST, Vikram K <kpguy1975 at gmail.com> wrote:

> I wish to read a large data file (file size is around 1.8 MB) and
> manipulate the data in this file. Just reading and writing the first 500
> lines of this file is causing a problem. I wrote:

Unless you are very constrained memory wise 1.8 Mb is not that much. Maybe you meant Gb instead of Mb?

> fin = open('gene-GS00471-DNA_B01_
> 1101_37-ASM.tsv')
> count = 0
> for i in fin.readlines():

readlines() will read the whole file and store it in memory. If the file is 1.8 Gb then I can understand this will cause you to run out of memory. Normally for file like objects python you iterate over it directly which means python will only read the lines it needs instead of the whole file at once. So my suggestion is to remove .readlines() and see how it goes.

Greets
Sander

>         print i
>         count += 1
>         if count >= 500:
>                 break
> 
> and got this error msg:
> 
> Traceback (most recent call last):
>     File
> "H:\genome_4_omics_study\GS000003696-DID\GS00471-DNA_B01_1101_37-ASM\GS00471-DNA_B01\ASM\gene-GS00471-DNA_B01_1101_37-ASM.tsv\test.py",
> line 3, in <module>
>         for i in fin.readlines():
> MemoryError
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110517/ccd6ffa9/attachment.html>