
On 11/16/10 10:01 AM, Christopher Barker wrote:
OK -- I'll whip up a test similar to yours -- stay tuned!
Here's what I've done:
import numpy as np from maproomlib.utility import file_scanner
def gen_file(): f = file('test.dat', 'w') for i in range(1200): f.write('1 ' * 2048) f.write('\n') f.close()
def read_file1(): """ read unknown length: doubles""" f = file('test.dat') arr = file_scanner.FileScan(f) f.close() return arr
def read_file2(): """ read known length: doubles""" f = file('test.dat') arr = file_scanner.FileScanN(f, 1200*2048) f.close() return arr
def read_file3(): """ read known length: singles""" f = file('test.dat') arr = file_scanner.FileScanN_single(f, 1200*2048) f.close() return arr
def read_fromfile1(): """ read unknown length with fromfile(): singles""" f = file('test.dat') arr = np.fromfile(f, dtype=np.float32, sep=' ') f.close() return arr
def read_fromfile2(): """ read unknown length with fromfile(): doubles""" f = file('test.dat') arr = np.fromfile(f, dtype=np.float64, sep=' ') f.close() return arr
def read_fromstring1(): """ read unknown length with fromstring(): singles""" f = file('test.dat') str = f.read() arr = np.fromstring(str, dtype=np.float32, sep=' ') f.close() return arr
And the results (ipython's timeit):
In [40]: timeit test.read_fromfile1() 1 loops, best of 3: 561 ms per loop
In [41]: timeit test.read_fromfile2() 1 loops, best of 3: 570 ms per loop
In [42]: timeit test.read_file1() 1 loops, best of 3: 336 ms per loop
In [43]: timeit test.read_file2() 1 loops, best of 3: 341 ms per loop
In [44]: timeit test.read_file3() 1 loops, best of 3: 515 ms per loop
In [46]: timeit test.read_fromstring1() 1 loops, best of 3: 301 ms per loop
So my filescanner is faster, but not radically so, than fromfile(). However, reading the whole file into a string, then using fromstring() is, in fact, tne fastest method -- interesting -- shows you why you need to profile!
Also, with my code, reading singles is slower than doubles -- odd. Perhaps the C lib fscanf read doubles anyway, then converts to singles?
Anyway, for my needs, my file_scanner and fromfile() are fast enough, and much faster than parsing the files with Python. My issue with fromfile is flexibility and robustness -- it's buggy in the face of ill-formed files. See the list archives and the bug reports for more detail.
Still, it seems your very basic method is indeed a faster way to go.
I've enclosed the files. It's currently built as part of a larger lib, so no setup.py -- though it could be written easily enough.
-Chris