[Baypiggies] reading very large files

Vikram K kpguy1975 at gmail.com
Tue May 17 19:53:52 CEST 2011


Thanks to all for their responses. A correction: my data file is 1.8 GB and
not 1.8 MB as Simeon pointed out.

On Tue, May 17, 2011 at 1:41 PM, Simeon Franklin <simeonf at gmail.com> wrote:

> On Tue, May 17, 2011 at 10:17 AM, Vikram K <kpguy1975 at gmail.com> wrote:
> > I wish to read a large data file (file size is around 1.8 MB) and
> manipulate
> > the data in this file. Just reading and writing the first 500 lines of
> this
> > file is causing a problem. I wrote:
> >
> > fin = open('gene-GS00471-DNA_B01_1101_37-ASM.tsv')
> > count = 0
> > for i in fin.readlines():
> >     print i
> >     count += 1
> >     if count >= 500:
> >         break
>
> You don't need the readlines call - the file object itself supports
> iteration over lines; readlines() is there is you specifically want to
> create a list containing all the lines in the file. Try it with
>
> for i in fin:
>
> instead of
>
> for i in fin.readlines():
>
> and see... Were you mistaken above and is the filesize 1.8 GB instead
> of MB? You shouldn't be having memory errors with 1.8MB given a normal
> environment. If you are working with multi-gigabyte files, however,
> you should read David Beazley's awesome Generator Tricks paper
> (http://www.dabeaz.com/generators-uk/). I re-read it on a regular
> basis and always pick up something new...
>
> -regards
> Simeon Franklin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110517/6d5b723a/attachment.html>


More information about the Baypiggies mailing list