[Tutor] parse text file

Alan Gauld alan.gauld at btinternet.com
Thu Jun 3 19:15:45 CEST 2010

"Colin Talbert" <talbertc at usgs.gov> wrote

> I thought when you did a for uline in input_file each single line 
> would go
> into memory independently, not the entire file.

Thats true but your code snippet showed you using read()
which reads the whole file...

> I'm pretty sure that this is not your code, because you can't call 
> len()
> on a bz2 file. If you try, you get an error:
> You are so correct.  I'd been trying numerous things to read in this 
> file
> and had deleted the code that I meant to put here and so wrote this 
> from
> memory incorrectly.  The code that I wrote should have been:
> import bz2
> input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
> str=input_file.read()
> len(str)

This again usees read() which reads the whole file.

> Which is also the number returned when you sum the length of all the 
> lines
> returned in a for line in file with:
> import bz2
> input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
> lengthz = 0
> for uline in input_file:
>    lengthz = lengthz + len(uline)

I'm not sure how

for line in file

will work for binary files. It may read the whole thing since
the concept of lines really only applies to text. So it may
be the same result as using read()

Try looping using read(n) where n is some buffer size
(1024 might be a good value?).


Alan Gauld
Author of the Learn to Program web site

More information about the Tutor mailing list