[AstroPy] writing/compressing fits files

Evert Rol evert.rol at gmail.com
Fri Apr 15 02:49:54 EDT 2016


Kyle, here's a guess at what might be going on. 
I assume your file has multiple HDUs; at this file size, that makes sense.

When astropy.io.fits encounters .gz filename to write to, it opens a gzip file as a file pointer, and then iteratees over the individual headers to write them to file. 
I guess this is not optimal for the gzip procedure: it likes to know about the full file to optimize the compression.
I don't know how the Python gzip module is implemented, but it may be going back and forth to determine the compression after (or during) every write of a single HDU. That would explain why it takes so long. Or it just takes long because it gzips every HDU individually, and you're effectively seeing the overhead cost for every HDU.
If there are quite a few HDUs and each HDU is compressed individually, then the compression is also less than optimal (gzip allows concatened gzip streams inside a file, but each of them will be less optimal than combined gzipped data).

Of course, all this depends on the number of HDUs in your file; if that's just a few, this seems to be unlikely the cause.
If you do have a large number of HDUs, you could test if this is the problem by creating a single file of about the same size, with just one HDU (probably random data for non-optimal compression; or perhaps you can use your actual data saved into a PrimaryHDU as an N+1 array). Then save a 1-element HDUList with the same (sized) data as a gzip FITS file, and see how fast that is (depending on the used data, the compressed size may or may not be around 14 MB).

Cheers,

  Evert

ps: I don't know CompImageHDU, but a quick glance suggests that this implements the compression of the data in a HDU. That might be somewhat faster (since it would have no option of going back and forth in the file itself), but with many HDUs, the compression is still less than optimal.



> Hi all,
> 
> I'm getting what feels like odd behavior in astropy.  I have a file that is 386 MB when written uncompressed by astropy.io.fits.HDUList.writeto().  If I try to write the file as *.fits.gz, instead of *.fits, it is compressed down to 77 MB.  The problem is that it takes ~2 minutes longer to write the compressed version.
> 
> 
> Alternatively, I've just compressed the file using the gzip executable on my Mac:
> 
> % gzip -V
> Apple gzip 242
> 
> This compresses the file to 14 MB in a few seconds.
> 
> 
> 
> I want to avoid system calls, so I also tried the following:
> 
> >>> import gzip
> >>> import shutil
> >>> with open('test.fits', 'rb') as f_in:
> ...     with gzip.open('test.fits.gz', 'wb') as f_out:
> ...         shutil.copyfileobj(f_in, f_out)
> ...
> 
> pulled from here:
> 
> https://docs.python.org/3/library/gzip.html#examples-of-usage
> 
> This takes slightly longer than the system call, but not nearly as long as the astropy.io.fits.HDUList.writeto() command.  And this call actually compresses the file down to 12 MB.  So the above is my short-term work-around, but I'm wondering if there are better options.
> 
> One option that was suggested to me was to use CompImageHDU, instead of ImageHDU.  So I'll run some tests with that, as well.
> 
> Thanks for any and all advice,
> Kyle
> 
> 
> 
> 
> -- 
> ------------------------------
> Kyle B. Westfall
> 
> kyle.westfall at port.ac.uk
> 
> 
> Institute of Cosmology
>     and Gravitation (ICG)
> University of Portsmouth
> Dennis Sciama Building                     
> Burnaby Road
> Portsmouth PO1 3FX
> United Kingdom
> 
> +44 (0)23 9284 5158
> 
> www.icg.port.ac.uk/~westfall/
> 
> ------------------------------
> 
> 
> 
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> https://mail.scipy.org/mailman/listinfo/astropy




More information about the AstroPy mailing list