[New-bugs-announce] [issue1141] reading large files

christen report at bugs.python.org
Mon Sep 10 14:45:04 CEST 2007


New submission from christen:

September 11, 2007 I downloaded py 3.k

The good news :
Under Windows, Python 3k properly reads files larger than 4 Go (in
contrast to python 2.5 that skips some lines, see below)

The bad news : py 3k is very slow compared to py 2.5; see the results below
the code is 
it reads a 4.9 Go file of 81,017,719 lines (a genbank entry of bacterial
sequences)

#######################
import time 
print (time.localtime())
fichin=open(r'D:\pythons\16s\total_gb_161_16S.gb')
t0= time.localtime()
print (t0)
i=0

for li in fichin:
	i+=1
	if i%1000000==0: 
		print (i,time.localtime())
	
fichin.close()
print ()
print (i)
print (time.localtime())
#########################


I got the following results (Windows XP 64) on the same machine, using
either py 3k or py 2.5
As soon as my BSD and Linux machines are done with calculations, I will
try that on them.
Best
Richard Christen


python 3k

(2007, 9, 10, 13, 53, 36, 0, 253, 1)
(2007, 9, 10, 13, 53, 36, 0, 253, 1)
1000000 (2007, 9, 10, 13, 53, 49, 0, 253, 1)
2000000 (2007, 9, 10, 13, 54, 3, 0, 253, 1)
3000000 (2007, 9, 10, 13, 54, 18, 0, 253, 1)
4000000 (2007, 9, 10, 13, 54, 32, 0, 253, 1)
5000000 (2007, 9, 10, 13, 54, 47, 0, 253, 1)
....
77000000 (2007, 9, 10, 14, 14, 55, 0, 253, 1)
78000000 (2007, 9, 10, 14, 15, 9, 0, 253, 1)
79000000 (2007, 9, 10, 14, 15, 22, 0, 253, 1)
80000000 (2007, 9, 10, 14, 15, 36, 0, 253, 1)
81000000 (2007, 9, 10, 14, 15, 49, 0, 253, 1)

81017719    #this is the proper number of lines 
(2007, 9, 10, 14, 15, 50, 0, 253, 1)


Python 2.5

(2007, 9, 10, 14, 18, 33, 0, 253, 1)
(2007, 9, 10, 14, 18, 33, 0, 253, 1)
(1000000, (2007, 9, 10, 14, 18, 34, 0, 253, 1))
(2000000, (2007, 9, 10, 14, 18, 34, 0, 253, 1))
(3000000, (2007, 9, 10, 14, 18, 35, 0, 253, 1))
(4000000, (2007, 9, 10, 14, 18, 35, 0, 253, 1))
(5000000, (2007, 9, 10, 14, 18, 36, 0, 253, 1))
...
(77000000, (2007, 9, 10, 14, 19, 10, 0, 253, 1))
(78000000, (2007, 9, 10, 14, 19, 11, 0, 253, 1))
(79000000, (2007, 9, 10, 14, 19, 11, 0, 253, 1))
(80000000, (2007, 9, 10, 14, 19, 12, 0, 253, 1))
(81000000, (2007, 9, 10, 14, 19, 12, 0, 253, 1))
()
81014962      #python 2.5 missed some lines !!!!
(2007, 9, 10, 14, 19, 12, 0, 253, 1)

----------
components: Tests
messages: 55777
nosy: Richard.Christen at unice.fr
severity: normal
status: open
title: reading large files
type: behavior
versions: Python 3.0

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1141>
__________________________________


More information about the New-bugs-announce mailing list