[Tutor] parse text file

Steven D'Aprano steve at pearwood.info
Wed Jun 2 02:12:06 CEST 2010


On Wed, 2 Jun 2010 07:40:33 am Colin Talbert wrote:
>         I am also experiencing this same problem.  (Also on a OSM bz2
> file).  It appears to be working but then partway through reading a
> file it simple ends.  I did track down that file length is always
> 900000 so it appears to be related to some sort of buffer constraint.

Without seeing your text file, and the code you use to read the text 
file, there's no way of telling what is going on, but I can guess the 
most likely causes:

(1) Your text file is actually only 900,000 bytes long, and so there's 
no problem at all.
(2) There's a bug in your code so that you stop reading after 900,000 
bytes.
(3) You're on Windows, and the text file contains an End-Of-File 
character ^Z after 900,000 bytes, and Windows supports that for 
backward compatibility with DOS.

And a distant (VERY distant) number 4, there's a bug in the 
implementation of read() in Python which somehow nobody has noticed 
before now.

As for your second issue, reading bz2 files:

> import bz2
>
> input_file = bz2.BZ2File(r"C:\temp\planet-latest.osm.bz2","r")

You're opening a binary file in text mode. I'm pretty sure that is not 
going to work well. Try passing 'rb' as the mode instead.

> try:
>     all_data = input_file.read()
>     print str(len(all_data))

You don't need to call str() before calling print. print is perfectly 
happy to operate on integers:

    print len(all_data)

will work.


-- 
Steven D'Aprano


More information about the Tutor mailing list