[Numpy-discussion] Loading a > GB file into array

Mon Dec 3 08:40:27 EST 2007

A Monday 03 December 2007, Martin Spacek escrigué:
> Sebastian Haase wrote:
> > reading this thread I have two comments.
> > a) *Displaying* at 200Hz probably makes little sense, since humans
> > would only see about max. of 30Hz (aka video frame rate).
> > Consequently you would want to separate your data frame rate, that
> > (as I understand) you want to save data to disk and -
> > asynchrounously - "display as many frames as you can" (I have used
> > pyOpenGL for this with great satisfaction)
>
> Hi Sebastian,
>
> Although 30Hz looks pretty good, if you watch a 60fps movie, you can
> easily tell the difference. It's much smoother. Try recording AVIs on
> a point and shoot digital camera, if you have one that can do both
> 30fps and 60fps (like my fairly old Canon SD200).
>
> And that's just perception. We're doing neurophysiology, recording
> from neurons in the visual cortex, which can phase lock to CRT screen
> rasters up to 100Hz. This is an artifact we don't want to deal with,
> so we use a 200Hz monitor. I need to be certain of exactly what's on
> the monitor on every refresh, ie every 5ms, so I run python (with
> Andrew Straw's package VisionEgg) as a "realtime" priority process in
> windows on a dual core computer, which lets me reliably update the
> video frame buffer in time for the next refresh, without having to
> worry about windows multitasking butting in and stealing CPU cycles
> for the next 15-20ms. Python runs on one core in "realtime", windows
> does its junk on the other core. Right now, every 3rd video refresh
> (ie every 15ms, which is 66.7 Hz, close to the original 60fps the
> movie was recorded at) I update with a new movie frame. That update
> needs to happen in less than 5ms, every time. If there's any disk
> access involved during the update, it inevitably exceeds that time
> limit, so I have to have it all in RAM before playback begins. Having
> a second I/O thread running on the second core would be great though.

Perhaps something that can surely improve your timings is first performing a read of your data file(s) while throwing the data as you are reading it.  This serves only to load the file entirely (if you have memory enough, but this seems your case) in OS page cache.  Then, the second time that your code has to read the data, the OS only have to retrieve it from its cache (i.e. in memory) rather than from disk.

You can do this with whatever technique you want, but if you are after reading from a single container and memmap is giving you headaches in 32-bit platforms, you might try PyTables because it allows 64-bit disk addressing transparently, even on 32-bit machines.

HTH,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20071203/bfb46dc6/attachment.html>