Python 2.5, problems reading large ( > 4Gbyes) files on win2k
casevh at gmail.com
casevh at gmail.com
Sat Mar 3 03:03:06 EST 2007
On Mar 2, 10:09 am, padu... at cisco.com wrote:
> Folks,
>
> I've a Python 2.5 app running on 32 bit Win 2k SP4 (NTFS volume).
> Reading a file of 13 GBytes, one line at a time. It appears that,
> once the read line passes the 4 GByte boundary, I am getting
> occasional random line concatenations. Input file is confirmed good
> via UltraEdit. Groovy version of the same app runs fine.
>
> Any ideas?
>
> Cheers
It appears to be a bug. I am able to reproduce the problem with the
code fragment below. It creates a 12GB file with line lengths ranging
from 0 to 126 bytes, and repeating that set of lines 1500000 times. It
fails on W2K SP4 with both Python 2.4 and 2.5. It works correctly on
Linux (Ubuntu 6.10).
I have reported on SourceForge as bug 1672853.
# Read and write a huge file.
import sys
def write_file(end = 126, loops = 150, fname='bigfile'):
fh = open(fname, 'w')
buff = 'A' * end
for k in range(loops):
for t in range(end+1):
fh.write(buff[:t]+'\n')
fh.close()
def read_file(end = 126, fname = 'bigfile'):
fh = open(fname, 'r')
offset = 0
loops = 0
for rec in fh:
if offset != len(rec.strip()):
print 'Error at loop:', loops
print 'Expected record length:', offset
print 'Actual record length:', len(rec.strip())
sys.exit(0)
offset += 1
if offset > end:
offset = 0
loops += 1
if not loops % 10000: print loops
fh.close()
if __name__ == '__main__':
write_file(loops=1500000)
read_file()
casevh
More information about the Python-list
mailing list