Thanx for the fast response Robert ;)

I changed my code to use the slice:
E = data[6::9]
It is indeed faster and less eat less memory. Great.

Thanx for the endiannes! I knew there was something like this ;) I suspect that, in '>f8', "f" means float and "8" means 8 bytes?

From some benchmarks, I see that the slowest thing is disk access. It can slow the displaying of data from around 1sec (when data is in os cache or buffer) to 8sec.

So the next step would be to only read the needed data from the binary file... Is it possible to read from a file with a slice? So instead of:
data = numpy.fromfile(file=f, dtype=float_dtype, count=9*Stot)
E = data[6::9]
maybe something like:
E = numpy.fromfile(file=f, dtype=float_dtype, count=9*Stot, slice=6::9)

Thank you!


2008/4/3, Robert Kern <robert.kern@gmail.com>:
On Thu, Apr 3, 2008 at 3:30 PM, Nicolas Bigaouette
<nbigaouette@gmail.com> wrote:
> Hi,
>
> I have a C program which outputs large (~GB) files. It is a simple binary
> dump of an array of structure containing 9 doubles. You can see this as a
> double 1D array of size 9*Stot (Stot being the allocated size of the array
> of structure). The 1D array represents a 3D array (Sx * Sy * Sz = Stot)
> containing 9 values per cell.
>
> I want to read these files in the most efficient way possible, and I would
> like to have your insight on this.
>
> Right now, the fastest way I found was:
> imzeros = zeros((Sy,Sz),dtype=float64,order='C')
>  imex = imshow(imzeros)
> f = open(filename, 'rb')
> data = numpy.fromfile(file=f, dtype=numpy.float64, count=9*Stot)
> mask_Ex = numpy.arange(6,9*Stot,9)


This is something you can do much, much more efficiently by using a
slice instead of indexing with an integer array.


> Ex = data[mask].reshape((Sz,Sy,Sx), order='C').transpose()
>  imex.set_array(squeeze(Ex3D[:,:,z]))
>
> The arrays will be big, so everything should be well optimized. I have
> multiple questions:
>
> 1) Should I change this:
> Ex = data[mask].reshape((Sz,Sy,Sx), order='C').transpose()
>  imex.set_array(squeeze(Ex3D[:,:,z]))
> to:
>  imex.set_array(squeeze(data[mask].reshape((Sz,Sy,Sx),
> order='C').transpose()[:,:,z]))
> I mean, is I don't use a temporary variable, will it be faster or less
> memory hungry?


No. The temporary exists whether you give it a name or not. If you use
data[6::9] instead of data[mask], you won't be using any extra memory
at all. The arrays will just be views into the original array.


> 2) If not, is the operation "Ex = " update the variable data or create
> another one?


It just reassigns the name "Ex" to a different object specified on the
right-hand side of the assignment. The relevant question is whether
expression on the right-hand side takes up more memory.


> Ideally I would like to only update it. Maybe this would be
> better:
>
> Ex[:,:,:] = data[mask].reshape((Sz,Sy,Sx), order='C').transpose()Would it?


If you use data[6::9] instead of data[mask], you should just use "Ex =
" since no new memory will be used on the RHS.


> 3) The machine where the code will be run might be big-endian. Is there a
> way for python to read the big-endian file and "translate" it automatically
> to little-endian? Something like "numpy.fromfile(file=f,
> dtype=numpy.float64, count=9*Stot, endianness='big')"?


dtype=numpy.dtype('>f8')

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion