> Since the files are huge, and would make me run out of memory, I need to read data skipping some records
Is it possible to describe what you're doing with the data once you have subsampled it? And if there were a way to work with the full resolution data, would that be desirable?
I ask because I've been dabbling with a pure-Python library for handilng larger-than-memory datasets -
https://github.com/SciTools/biggus, and it uses similar chunking techniques as mentioned in the other replies to process data at the full streaming I/O rate. It's still in the early stages of development so the design can be fluid, so maybe it's worth seeing if there's enough in common with your needs to warrant adding your use case.
Richard