readlines() reading incorrect number of lines?
John Machin
sjmachin at lexicon.net
Thu Dec 20 16:07:15 EST 2007
On Dec 21, 7:41 am, Wojciech Gryc <wojci... at gmail.com> wrote:
> Hi,
>
> Python 2.5, on Windows XP. Actually, I think you may be right about
> \x1a -- there's a few lines that definitely have some strange
> character sequences, so this would make sense... Would you happen to
> know how I can actually fix this (e.g. replace the character)? Since
> Python doesn't see the rest of the file, I don't even know how to get
> to it to fix the problem... Due to the nature of the data I'm working
> with, manual editing is also not an option.
>
Please don't top-post.
Quick hack to remove all occurrences of '\x1a' (untested):
fin = open('old_file', 'rb') # note b BINARY
fout = open('new_file', 'wb')
blksz = 1024 * 1024
while True:
blk = fin.read(blksz)
if not blk: break
fout.write(blk.replace('\x1a', ''))
fout.close()
fin.close()
You may however want to investigate the "strange character sequences"
that have somehow appeared in your file after you built it
yourself :-)
HTH,
John
More information about the Python-list
mailing list