[SciPy-user] Pickling Large (Image) Arrays

Tue Sep 23 19:13:16 EDT 2008

2008/9/23 Keith Suda-Cederquist <kdsudac at yahoo.com>:

> I'm doing some image processing on some rather large images (2000x2000
> pixels and each pixel has 16 bits) so the file comes in at 7-8 MB.  During
> the image processing I convert the image to a 64-bit float numpy array and
> do a bunch of operations on the image.
>
> In certain cases (where tests fail), I'd like to save all the data to a file
> to take a look at later and debug.  I need to keep the size of this file as
> small as possible.
>
> I'm thinking of writing some code that will round pixel values to an 8-bit
> unsigned integer and then pickle the data to a file.  Is this the a good
> approach?  Can anyone suggest a better approach?  Will this actually succeed
> in reducing the file size, or will I just be wasting my time?

First of all, the current release of numpy includes a native file
format that is fairly efficient, fast, and portable. If you can, it's
probably better to use that than pickles. But by itself it won't save
all that much space: almost all the space in either format is taken up
by the pixel array. If you convert the pixel array to one with dtype
uint8 or unit16, you'll use one byte per pixel instead of eight. You
do of course lose information this way, and if this obscures why the
test is failing, it will be quite frustrating. If the data is very
compressible, you could look into using the Python Imaging Library to
save it in some compressed image format, though this will almost
certainly lose even more information.

Anne