
I am wrapping up a small package to parse a particular ascii-encoded file format generated by a program we use heavily here at the lab. (In the unlikely event that you work at a synchrotron, and use Certified Scientific's "spec" program, and are actually interested, the code is currently available at https://github.com/darrendale/praxes/tree/specformat/praxes/io/spec/ .)
I have been benchmarking the project against another python package developed by a colleague, which is an extension module written in pure C. My python/cython project takes about twice as long to parse and index a file (~0.8 seconds for 100MB), which is acceptable. However, actually converting ascii strings to numpy arrays, which is done using numpy.fromstring, takes a factor of 10 longer than the extension module. So I am wondering about the performance of np.fromstring:
import time import numpy as np s = b'1 ' * 2048 *1200 d = time.time() x = np.fromstring(s) print time.time() - d