On 13 Mar 2013 15:16, "Andrea Cimatoribus" Andrea.Cimatoribus@nioz.nl wrote:
Ok, this seems to be working (well, as soon as I get the right offset and
things like that, but that's a different story).
The problem is that it does not go any faster than my initial function
compiled with cython, and it is still a lot slower than fromfile. Is there a reason why, even with compiled code, reading from a file skipping some records should be slower than reading the whole file?
Oh, in that case you're probably IO bound, not CPU bound, so Cython etc. can't help.
Traditional spinning-disk hard drives can read quite quickly, but take a long time to find the right place to read from and start reading. Your OS has heuristics in it to detect sequential reads and automatically start the setup for the next read while you're processing the previous read, so you don't see the seek overhead. If your reads are widely separated enough, these heuristics will get confused and you'll drop back to doing a new disk seek on every call to read(), which is deadly. (And would explain what you're seeing.) If this is what's going on, your best bet is to just write a python loop that uses fromfile() to read some largeish (megabytes?) chunk, subsample those and throw away the rest, and repeat.