[Tutor] parse text file

Colin Talbert talbertc at usgs.gov
Thu Jun 3 21:02:22 CEST 2010


Dave,
        I think you are probably right about using decompressor.  I 
couldn't find any example of it in use and wasn't having any luck getting 
it to work based on the documentation.  Maybe I should try harder on this 
front.

Colin Talbert
GIS Specialist
US Geological Survey - Fort Collins Science Center
2150 Centre Ave. Bldg. C
Fort Collins, CO 80526

(970) 226-9425
talbertc at usgs.gov




From:
Dave Angel <davea at ieee.org>
To:
Colin Talbert <talbertc at usgs.gov>
Cc:
Steven D'Aprano <steve at pearwood.info>, tutor at python.org
Date:
06/03/2010 12:36 PM
Subject:
Re: [Tutor] parse text file



Colin Talbert wrote:
> <snip>
> You are so correct.  I'd been trying numerous things to read in this 
file 
> and had deleted the code that I meant to put here and so wrote this from 

> memory incorrectly.  The code that I wrote should have been:
>
> import bz2
> input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
> str=input_file.read()
> len(str)
>
> Which indeed does return only 900000.
>
> Which is also the number returned when you sum the length of all the 
lines 
> returned in a for line in file with:
>
>
> import bz2
> input_file = bz2.BZ2File(r'C:\temp\planet-latest.osm.bz2','rb')
> lengthz = 0
> for uline in input_file:
>     lengthz = lengthz + len(uline)
>
> print lengthz
>
> <snip>
> 
>
Seems to me for such a large file you'd have to use 
bz2.BZ2Decompressor.  I have no experience with it, but its purpose is 
for sequential decompression -- decompression where not all the data is 
simultaneously available in memory.

DaveA



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100603/5f373133/attachment.html>


More information about the Tutor mailing list