[Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping data chunks

Wed Mar 13 11:43:02 EDT 2013

On 13 Mar 2013 15:16, "Andrea Cimatoribus" <Andrea.Cimatoribus at nioz.nl>
wrote:
>
> Ok, this seems to be working (well, as soon as I get the right offset and
things like that, but that's a different story).
> The problem is that it does not go any faster than my initial function
compiled with cython, and it is still a lot slower than fromfile. Is there
a reason why, even with compiled code, reading from a file skipping some
records should be slower than reading the whole file?

Oh, in that case you're probably IO bound, not CPU bound, so Cython etc.
can't help.

Traditional spinning-disk hard drives can read quite quickly, but take a
long time to find the right place to read from and start reading. Your OS
has heuristics in it to detect sequential reads and automatically start the
setup for the next read while you're processing the previous read, so you
don't see the seek overhead. If your reads are widely separated enough,
these heuristics will get confused and you'll drop back to doing a new disk
seek on every call to read(), which is deadly. (And would explain what
you're seeing.) If this is what's going on, your best bet is to just write
a python loop that uses fromfile() to read some largeish (megabytes?)
chunk, subsample those and throw away the rest, and repeat.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130313/2db5dfad/attachment.html>