[AstroPy] writing 2D string column to fits binary table

Stephen Bailey stephenbailey at lbl.gov
Tue Jan 27 18:22:39 EST 2015


On Tue, Jan 27, 2015 at 10:28 AM, Erin Sheldon <erin.sheldon at gmail.com>
wrote:

> Hi Stephen -
>
> This is a difficult problem I think.
>
> In FITS, the strings must get padded with spaces when written.  That is my
> understanding of the format.
>

From
https://archive.stsci.edu/fits/fits_standard/node70.html#SECTION001233130000000000000
:

8.3.3 Data Sequence -> 8.3.3.1 Main Data Table  -> Character

If the value of the TFORMn keyword specifies data type A, field n shall
contain a character string of  zero or more members, composed of  ASCII
text. This character string may be terminated before the length specified
by the repeat count  by  an ASCII NULL (hexadecimal code 00). Characters
after the first ASCII NULL are not defined. A string with the number of
characters specified by the repeat count is not NULL terminated. Null
strings are defined by the presence of an ASCII NULL as the first character.


This indicates that padding with NULLs is the correct thing to do when the
string is shorter than the field.  This is also how the data are in the
ndarray memory and it is distinguishable from the case of strings that end
with trailing spaces when reading it back in.

There is no way to know the intention of the user, whether spaces in the
> original data are "significant" or not.  This information is not available
> in
> a FITS file, only the size of the field in bytes.


> The convention in many codes is to strip all trailing whitespace on
> reading.
> If I recall correctly, IDL mrdfits does not strip.
>
> If a user thinks the whitespace is significant they will be surprised when
> reading that the spaces are not there.   If the user thinks the whitespace
> is
> not significant they will may be surprised if it is there.
>
> I took what I think is probably a controversial stance:  I will not lose
> user
> data.  So I always read the full field and retain spaces.  It is up to the
> user to strip them if that matters.
>

I completely agree with the philosophy of not losing user data, even when
it is a pain for the user.  But in this case, I think the standard allows
(even specifies!) to pad with NULLs instead of spaces if the input strings
are genuinely shorter than the field.

Agree?

Stephen



>
> I am willing to reconsider based on good arguments.
>
> An idea comes to mind: maybe support a strip_strings= keyword for the FITS
> object and the reader routines.
>
> thanks for the discussion,
> -e
>
> On 1/26/15, Stephen Bailey <stephenbailey at lbl.gov> wrote:
> > astropy.io.fits and fitsio aficionados,
> >
> > I'm trying to write a FITS binary table that includes a column that is a
> 2D
> > array of strings.  Curiously, only files written by astropy.io.fits and
> > read by fitsio pass the test of actually reconstructing the input numpy
> > array.
> >
> > astropy.io.fits loses the 2D information when reading back in and ends up
> > with a 1D array of null separated characters per row.
> >
> > fitsio.fits pads its file with spaces when reading back in.
> >
> > But the file written by astropy.io.fits and then read by fitsio gets me
> > back what I put in.
> >
> > Details:
> >
> > import numpy as np
> >
> > from astropy.io import fits
> >
> > import fitsio
> >
> >
> > n = 10
> >
> > dtype = ([('ABC', 'S5', (3,)), ('X', int), ('Y', float)])
> >
> > data = np.zeros(n, dtype=dtype)
> >
> > data['X'] = np.arange(n)
> >
> > data['Y'] = np.arange(n)
> >
> > data['ABC'][:, 0] = 'a'
> >
> > data['ABC'][:, 1] = 'b'
> >
> > data['ABC'][:, 2] = 'c'
> >
> > data['ABC'][0] = ['x', 'y', 'z']
> >
> >
> > fits.writeto('apio.fits', data)
> >
> > fitsio.write('fio.fits', data)
> >
> > astropy.io.fits complains:
> >
> >
> /Users/sbailey/anaconda/lib/python2.7/site-packages/astropy/io/fits/fitsrec.py:782:
> > UserWarning: TDIM1 value (5,3) does not fit with the size of the array
> > items (5).  TDIM1 will be ignored.
> >
> >   actual_nitems, indx + 1))
> >
> > fitsio seems happy.  Reading it back in:
> >
> > In [*15*]: np.array(fits.getdata('apio.fits'))
> >
> > Out[*15*]:
> >
> > array([('x\x00\x00\x00\x00y\x00\x00\x00\x00z', 0, 0.0),
> >
> >        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 1, 1.0),
> >
> >        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 2, 2.0),
> >
> >        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 3, 3.0),
> >
> >        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 4, 4.0),
> >
> >        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 5, 5.0),
> >
> >        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 6, 6.0),
> >
> >        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 7, 7.0),
> >
> >        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 8, 8.0),
> >
> >        ('a\x00\x00\x00\x00b\x00\x00\x00\x00c', 9, 9.0)],
> >
> >       dtype=[('ABC', 'S15'), ('X', '>i8'), ('Y', '>f8')])
> >
> >
> > In [*16*]: fitsio.read('fio.fits')
> >
> > Out[*16*]:
> >
> > array([(['x    ', 'y    ', 'z    '], 0, 0.0),
> >
> >        (['a    ', 'b    ', 'c    '], 1, 1.0),
> >
> >        (['a    ', 'b    ', 'c    '], 2, 2.0),
> >
> >        (['a    ', 'b    ', 'c    '], 3, 3.0),
> >
> >        (['a    ', 'b    ', 'c    '], 4, 4.0),
> >
> >        (['a    ', 'b    ', 'c    '], 5, 5.0),
> >
> >        (['a    ', 'b    ', 'c    '], 6, 6.0),
> >
> >        (['a    ', 'b    ', 'c    '], 7, 7.0),
> >
> >        (['a    ', 'b    ', 'c    '], 8, 8.0),
> >
> >        (['a    ', 'b    ', 'c    '], 9, 9.0)],
> >
> >       dtype=[('ABC', 'S5', (3,)), ('X', '>i8'), ('Y', '>f8')])
> >
> >
> > In [*17*]: data
> >
> > Out[*17*]:
> >
> > array([(['x', 'y', 'z'], 0, 0.0), (['a', 'b', 'c'], 1, 1.0),
> >
> >        (['a', 'b', 'c'], 2, 2.0), (['a', 'b', 'c'], 3, 3.0),
> >
> >        (['a', 'b', 'c'], 4, 4.0), (['a', 'b', 'c'], 5, 5.0),
> >
> >        (['a', 'b', 'c'], 6, 6.0), (['a', 'b', 'c'], 7, 7.0),
> >
> >        (['a', 'b', 'c'], 8, 8.0), (['a', 'b', 'c'], 9, 9.0)],
> >
> >       dtype=[('ABC', 'S5', (3,)), ('X', '<i8'), ('Y', '<f8')])
> >
> > But fitsio reading the file that astropy.io.fits wrote is good (despite
> the
> > warning that astropy.io.fits gave when writing the file):
> >
> > In [*18*]: fitsio.read('apio.fits')
> >
> > Out[*18*]:
> >
> > array([(['x', 'y', 'z'], 0, 0.0), (['a', 'b', 'c'], 1, 1.0),
> >
> >        (['a', 'b', 'c'], 2, 2.0), (['a', 'b', 'c'], 3, 3.0),
> >
> >        (['a', 'b', 'c'], 4, 4.0), (['a', 'b', 'c'], 5, 5.0),
> >
> >        (['a', 'b', 'c'], 6, 6.0), (['a', 'b', 'c'], 7, 7.0),
> >
> >        (['a', 'b', 'c'], 8, 8.0), (['a', 'b', 'c'], 9, 9.0)],
> >
> >       dtype=[('ABC', 'S5', (3,)), ('X', '>i8'), ('Y', '>f8')])
> >
> > Final check of combinations:
> >
> > In [*19*]: np.all(data == np.array(fits.getdata('apio.fits')))
> >
> > Out[*19*]: False
> >
> >
> > In [*20*]: np.all(data == np.array(fits.getdata('fio.fits')))
> >
> > Out[*20*]: False
> >
> >
> > In [*21*]: np.all(data == fitsio.read('apio.fits'))
> >
> > Out[*21*]: True
> >
> >
> > In [*22*]: np.all(data == fitsio.read('fio.fits'))
> >
> > Out[*22*]: False
> >
> > Feature?  Bug?  User error?  Other?
> >
> > Thanks for the help,
> >
> > Stephen
> >
>
>
> --
> Erin Scott Sheldon
> Brookhaven National Laboratory erin dot sheldon at gmail dot com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20150127/e3425001/attachment.html>


More information about the AstroPy mailing list