[Tutor] parse text file
Steven D'Aprano
steve at pearwood.info
Wed Jun 2 02:12:06 CEST 2010
On Wed, 2 Jun 2010 07:40:33 am Colin Talbert wrote:
> I am also experiencing this same problem. (Also on a OSM bz2
> file). It appears to be working but then partway through reading a
> file it simple ends. I did track down that file length is always
> 900000 so it appears to be related to some sort of buffer constraint.
Without seeing your text file, and the code you use to read the text
file, there's no way of telling what is going on, but I can guess the
most likely causes:
(1) Your text file is actually only 900,000 bytes long, and so there's
no problem at all.
(2) There's a bug in your code so that you stop reading after 900,000
bytes.
(3) You're on Windows, and the text file contains an End-Of-File
character ^Z after 900,000 bytes, and Windows supports that for
backward compatibility with DOS.
And a distant (VERY distant) number 4, there's a bug in the
implementation of read() in Python which somehow nobody has noticed
before now.
As for your second issue, reading bz2 files:
> import bz2
>
> input_file = bz2.BZ2File(r"C:\temp\planet-latest.osm.bz2","r")
You're opening a binary file in text mode. I'm pretty sure that is not
going to work well. Try passing 'rb' as the mode instead.
> try:
> all_data = input_file.read()
> print str(len(all_data))
You don't need to call str() before calling print. print is perfectly
happy to operate on integers:
print len(all_data)
will work.
--
Steven D'Aprano
More information about the Tutor
mailing list