On 11/16/10 10:01 AM, Christopher Barker wrote:
OK -- I'll whip up a test similar to yours -- stay tuned!
Here's what I've done: import numpy as np from maproomlib.utility import file_scanner def gen_file(): f = file('test.dat', 'w') for i in range(1200): f.write('1 ' * 2048) f.write('\n') f.close() def read_file1(): """ read unknown length: doubles""" f = file('test.dat') arr = file_scanner.FileScan(f) f.close() return arr def read_file2(): """ read known length: doubles""" f = file('test.dat') arr = file_scanner.FileScanN(f, 1200*2048) f.close() return arr def read_file3(): """ read known length: singles""" f = file('test.dat') arr = file_scanner.FileScanN_single(f, 1200*2048) f.close() return arr def read_fromfile1(): """ read unknown length with fromfile(): singles""" f = file('test.dat') arr = np.fromfile(f, dtype=np.float32, sep=' ') f.close() return arr def read_fromfile2(): """ read unknown length with fromfile(): doubles""" f = file('test.dat') arr = np.fromfile(f, dtype=np.float64, sep=' ') f.close() return arr def read_fromstring1(): """ read unknown length with fromstring(): singles""" f = file('test.dat') str = f.read() arr = np.fromstring(str, dtype=np.float32, sep=' ') f.close() return arr And the results (ipython's timeit): In [40]: timeit test.read_fromfile1() 1 loops, best of 3: 561 ms per loop In [41]: timeit test.read_fromfile2() 1 loops, best of 3: 570 ms per loop In [42]: timeit test.read_file1() 1 loops, best of 3: 336 ms per loop In [43]: timeit test.read_file2() 1 loops, best of 3: 341 ms per loop In [44]: timeit test.read_file3() 1 loops, best of 3: 515 ms per loop In [46]: timeit test.read_fromstring1() 1 loops, best of 3: 301 ms per loop So my filescanner is faster, but not radically so, than fromfile(). However, reading the whole file into a string, then using fromstring() is, in fact, tne fastest method -- interesting -- shows you why you need to profile! Also, with my code, reading singles is slower than doubles -- odd. Perhaps the C lib fscanf read doubles anyway, then converts to singles? Anyway, for my needs, my file_scanner and fromfile() are fast enough, and much faster than parsing the files with Python. My issue with fromfile is flexibility and robustness -- it's buggy in the face of ill-formed files. See the list archives and the bug reports for more detail. Still, it seems your very basic method is indeed a faster way to go. I've enclosed the files. It's currently built as part of a larger lib, so no setup.py -- though it could be written easily enough. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov