[AstroPy] saving tables as VOtables with missing values

Thomas Robitaille thomas.robitaille at gmail.com
Thu Oct 16 06:24:03 EDT 2014


Hi Susana,

Susana Sanchez Exposito wrote:
> Hi again,
> I tried to use None,which in python it is the best representation of a
> missing value, and the result is the astropy.table.Table method
> considers None as the string "None"

Indeed - at the moment the only way to get the behavior you want is to
use a value with the same type as the column for the 'null' values, so
for example if you want the column to be an integer column, you should
have only integer values in that column (so no None, empty string, or
not even nan since that is floating-point). For floating point columns
you can use nan but not None or not empty strings.

As Mike said, provided you set the mask correctly, the values you use
don't matter. For example:

from astropy.table import Table

list = [["CIG1", 1.2, 3, 0, 3], ["CIG2", 3, 2.0, 5, 2], ["CIG3", 2,
3,2.6,  float('nan')] ]
tab=Table(rows=list, names=["A", "B", "C", "D", "E"],  masked=True)
tab['B'].mask[1] = True
tab['D'].mask[0] = True
tab.write('test.xml', format='votable')

produces:

  <TABLE>
   <FIELD ID="A" arraysize="4" datatype="char" name="A"/>
   <FIELD ID="B" datatype="double" name="B"/>
   <FIELD ID="C" datatype="double" name="C"/>
   <FIELD ID="D" datatype="double" name="D"/>
   <FIELD ID="E" datatype="double" name="E"/>
   <DATA>
    <TABLEDATA>
     <TR>
      <TD>CIG1</TD>
      <TD>1.2</TD>
      <TD>3</TD>
      <TD/>
      <TD>3</TD>
     </TR>
     <TR>
      <TD>CIG2</TD>
      <TD/>
      <TD>2</TD>
      <TD>5</TD>
      <TD>2</TD>
     </TR>
     <TR>
      <TD>CIG3</TD>
      <TD>2</TD>
      <TD>3</TD>
      <TD>2.6000000000000001</TD>
      <TD>NaN</TD>
     </TR>
    </TABLEDATA>
   </DATA>
  </TABLE>

As you can see, the masked entries are save as:

      <TD/>

Now we could consider recognizing None as a masked value, but this could
be tricky and slow things down a lot. Maybe Tom Aldcroft can comment on
this?

Cheers,
Tom

> 
> list = [["CIG1", 1.2, 3, None, 3], [None, 3, 2.0, 5, 2], ["CIG3", 2,
> 3,2.6,  float('nan')] ]
> tab=astropy.table.Table(rows=list, names=["A", "B", "C", "D", "E"], 
> masked=True)
> votable=astropy.io.votable.from_table(tab)
> votable.to_xml("/home/sse/Desktop/test.xml")
> 
> And the VOTable:
> 
> <TABLE>
>    <FIELD ID="A" arraysize="*" datatype="unicodeChar" name="A"/>
>    <FIELD ID="B" datatype="double" name="B"/>
>    <FIELD ID="C" datatype="double" name="C"/>
>    <FIELD ID="D" arraysize="*" datatype="unicodeChar" name="D"/>
>    <FIELD ID="E" datatype="double" name="E"/>
>    <DATA>
>     <TABLEDATA>
>      <TR>
>       <TD>CIG1</TD>
>       <TD>1.2</TD>
>       <TD>3</TD>
>       <TD>None</TD>
>       <TD>3</TD>
>      </TR>
>      <TR>
>       <TD>None</TD>
>       <TD>3</TD>
>       <TD>2</TD>
>       <TD>5</TD>
>       <TD>2</TD>
>      </TR>
>      <TR>
>       <TD>CIG3</TD>
>       <TD>2</TD>
>       <TD>3</TD>
>       <TD>2.6</TD>
>       <TD>NaN</TD>
>      </TR>
>     </TABLEDATA>
> 
> 
> 
> 2014-10-16 11:39 GMT+02:00 Susana Sanchez Exposito <sse at iaa.es
> <mailto:sse at iaa.es>>:
> 
> 
> 
>     2014-10-15 18:37 GMT+02:00 Michael Droettboom <mdroe at stsci.edu
>     <mailto:mdroe at stsci.edu>>:
> 
>         On 10/15/2014 09:20 AM, Susana Sanchez Exposito wrote:
> 
>>         Thanks for your answer Tom!
>>
>>         I have tested what you proposed, and I found two things:
>>
>>         1) If you replace the empty string by a value (e.g. np.nan) in
>>         order to get a mask array, this value will remain in the
>>         resulting VOTable, and in some cases it could be strange. I
>>         illustrate this with this example (note the first column of
>>         the table is the object name):
>>
>>         list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3",
>>         2, 3,2.6,  float('nan')], ]
>>         to_nan=[]
>>         for r in list:
>>             to_nan.append([numpy.nan if (str(x).upper()=="NAN" or
>>         x=="") else x for x in r])
>>
>>         tab=astropy.table.Table(rows=to_nan, names=["A", "B", "C",
>>         "D", "E"],  masked=True)
>>         votable=astropy.io.votable.from_table(tab)
>>         votable.to_xml("/home/sse/Desktop/test.xml")
>>
>>         If you open the resulting VOtable (attached as test.xml) in
>>         Topcat you will see the 'nan' string in the object name
>>         column, which is very strange.
> 
>         The loop you provide above is putting the string “nan”, not the
>         floating-point value NaN in the column — since the column is a
>         string column, there’s really nothing else it can do. Signs
>         point to using your approach below, with an explicit mask, for
>         this reason.
> 
>     mmm, I am pretty sure the loop above is putting the float value NaN.
>     This ipython output shows that:
> 
> 
>     In [2]: list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2],
>     ["CIG3", 2, 3,2.6,  float('nan')], ]
>     In [3]: to_nan=[]
>     In [4]: for r in list:
>        ...:         to_nan.append([numpy.nan if (str(x).upper()=="NAN"
>     or x=="") else x for x in r])
>        ...:    
>     In [5]: print to_nan
>     [['CIG1', 1.2, 3, nan, 3], [nan, 3, 2.0, 5, 2], ['CIG3', 2, 3, 2.6,
>     nan]]
> 
>     Or maybe you mean other thing ?
> 
>      
> 
>>
>>
>>         Ok. I can build the mask array in other ways, BUT:
>>
>>         2) Even providing a mask, the column types are misinterpret. 
>>         The following example illustrates what I mean:
>>
>>          
>>         list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3",
>>         2, 3,2.6,  float('nan')] ]
>>
>>         mask_empty=[[False, False, False, True, False], [True, False,
>>         False, False, False], [False, False, False,False,  True] ]
>>         tab=astropy.table.Table(rows=list, names=["A", "B", "C", "D",
>>         "E"],  masked=True)
>>         tab.mask= numpy.array (mask_empty)
>>         votable=astropy.io.votable.from_table(tab)
>>         votable.to_xml("/home/sse/Desktop/test.xml")
>>
>>         If you open the resulting VOtable (attached as test2.xml) you
>>         will see that the 4th column is typed as char (<FIELD ID="D"
>>         arraysize="3" datatype="char" name="D"/>), when the only
>>         string in the column is masked!
>>
>>         Maybe I am doing something wrong?
> 
>         The determination of column types is independent of masking,
>         unfortunately, and happens on the line
>         |tab=astropy.table.Table(...| above before the table is even
>         made aware of the mask. In general, the masked values need to be
>         of the column type. But since it is masked, it shouldn’t matter
>         what that actual value is.
> 
>     Ok.This is what I was imaging.
> 
>     Thanks for your answer, Mike!
>      
> 
>         Mike
> 
>>
>>
>>
>>         2014-10-15 11:42 GMT+02:00 Thomas Robitaille
>>         <thomas.robitaille at gmail.com
>>         <mailto:thomas.robitaille at gmail.com>>:
>>
>>             Hi Susana,
>>
>>             Just to simplify your current workflow, what you are doing
>>             is equivalent to:
>>
>>             tab = Table(rows=list, names=...)
>>             tab.write('test.xml', format='votable')
>>
>>             so no need for the zip(*) call and the call to
>>             votable.from_table.
>>
>>             Now in terms of the masked values, I think the easiest is
>>             actually to
>>             just give some integer value to the missing value and then
>>             set the mask, so:
>>
>>             In [19]: t = Table(rows=list,
>>                                names=["A", "B", "C", "D", "E"],
>>                                masked=True)
>>
>>             In [20]: t['D'].mask[0] = True
>>
>>             In [21]: print(t)
>>              A    B   C   D   E
>>             ---- --- --- --- ---
>>             CIG1 1.2 3.0  -- 3.0
>>             CIG1 3.0 2.0 5.0 2.0
>>             CIG3 2.0 3.0 2.6 nan
>>
>>             and then write out to votable with:
>>
>>             t.write('test.xml', format='votable')
>>
>>             which should preserve the masks.
>>
>>             If you chose to use a special value (e.g. np.nan) to
>>             indicate masked
>>             values, then you can do:
>>
>>             t['D'].mask = np.isnan(t['D'].mask)
>>
>>             Let me know if any of the above isn't clear, or if it
>>             doesn't solve your
>>             issue!
>>
>>             Cheers,
>>             Tom
>>
>>             Susana Sanchez Exposito wrote:
>>             >
>>             > Hi all,
>>             >
>>             > I work on an interface where the user can view and edit
>>             tables and save
>>             > them as VOtables, and for that I use the Astropy library.
>>             >
>>             > I keep the data of the tables in python lists, so to
>>             save them as
>>             > VOtable I do this:
>>             >
>>             > list = [["CIG1", 1.2, 3, "", 3], ["CIG1", 3, 2.0, 5, 2],
>>             ["CIG3", 2,
>>             > 3,2.6,  float('nan')]]
>>             >
>>             > #transform the list of rows into list of columns
>>             > list_cols= zip(*list)
>>             >
>>             > tab=astropy.table.Table(list_cols, names=["A", "B", "C",
>>             "D", "E"])
>>             > votable=astropy.io.votable.from_table(tab)
>>             > votable.to_xml("/home/sse/Desktop/test.xml")
>>             >
>>             > The table.Table method interprets correctly  the type of
>>             each column
>>             > except for the 3rd column. This column contains a
>>             "missing value" or
>>             > maybe a value deleted by the user, so the whole columns
>>             is marked as
>>             > "string" type, when actually is float type.
>>             >
>>             > I could transform all empty strings into NaN but this
>>             will be strange
>>             > for those columns containing strings .
>>             >
>>             > I have tried to transform the python list into a masked
>>             array, but
>>             > without success: problems to mask empty strings.
>>             >
>>             > Maybe I should to find out the type of each columns,
>>             going over the
>>             > table and calculating the type of the majority of the
>>             column item, and
>>             > then pass this type array to the table.Table method in
>>             some way ??
>>             >
>>             > So before, to continue investigating, I would like to
>>             ask you for some
>>             > tips to solve the missing values problems with astropy,
>>             or maybe even if
>>             > there is a specific method for that.
>>             >
>>             > Thanks in advanced.
>>             >
>>             > Susana.
>>             >
>>             >
>>             >
>>             >
>>             >
>>             >
>>             >
>>             > --
>>             > Susana Sánchez Expósito
>>             >
>>             > Instituto de Astrofísica de Andalucía   IAA (CSIC)
>>             > Camino Bajo de Huétor, 50. Granada E-18008
>>             > Tel:(+34) 958 121 311
>>             <tel:%28%2B34%29%20958%20121%20311> / (+34) 958 230 618
>>             <tel:%28%2B34%29%20958%20230%20618>
>>             > Fax:(+34) 958 814 530 <tel:%28%2B34%29%20958%20814%20530>
>>             > e-mail: sse at iaa.es <mailto:sse at iaa.es>
>>             <mailto:sse at iaa.es <mailto:sse at iaa.es>>
>>             >
>>             > _______________________________________________
>>             > AstroPy mailing list
>>             > AstroPy at scipy.org <mailto:AstroPy at scipy.org>
>>             > http://mail.scipy.org/mailman/listinfo/astropy
>>             _______________________________________________
>>             AstroPy mailing list
>>             AstroPy at scipy.org <mailto:AstroPy at scipy.org>
>>             http://mail.scipy.org/mailman/listinfo/astropy
>>
>>
>>
>>
>>         -- 
>>         Susana Sánchez Expósito
>>
>>         Instituto de Astrofísica de Andalucía   IAA (CSIC)
>>         Camino Bajo de Huétor, 50. Granada E-18008
>>         Tel:(+34) 958 121 311 <tel:%28%2B34%29%20958%20121%20311> /
>>         (+34) 958 230 618 <tel:%28%2B34%29%20958%20230%20618>
>>         Fax:(+34) 958 814 530 <tel:%28%2B34%29%20958%20814%20530>
>>         e-mail: sse at iaa.es <mailto:sse at iaa.es>
>>
>>
>>         _______________________________________________
>>         AstroPy mailing list
>>         AstroPy at scipy.org <mailto:AstroPy at scipy.org>
>>         http://mail.scipy.org/mailman/listinfo/astropy
> 
>> 
>         -- 
>         Michael Droettboom
>         Science Software Branch
>         Space Telescope Science Institute
> 
>         http://www.droettboom.com
> 
> 
>         _______________________________________________
>         AstroPy mailing list
>         AstroPy at scipy.org <mailto:AstroPy at scipy.org>
>         http://mail.scipy.org/mailman/listinfo/astropy
> 
> 
> 
> 
>     -- 
>     Susana Sánchez Expósito
> 
>     Instituto de Astrofísica de Andalucía   IAA (CSIC)
>     Camino Bajo de Huétor, 50. Granada E-18008
>     Tel:(+34) 958 121 311 <tel:%28%2B34%29%20958%20121%20311> / (+34)
>     958 230 618 <tel:%28%2B34%29%20958%20230%20618>
>     Fax:(+34) 958 814 530 <tel:%28%2B34%29%20958%20814%20530>
>     e-mail: sse at iaa.es <mailto:sse at iaa.es>
> 
> 
> 
> 
> -- 
> Susana Sánchez Expósito
> 
> Instituto de Astrofísica de Andalucía   IAA (CSIC)
> Camino Bajo de Huétor, 50. Granada E-18008
> Tel:(+34) 958 121 311 / (+34) 958 230 618
> Fax:(+34) 958 814 530
> e-mail: sse at iaa.es <mailto:sse at iaa.es>
> 
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy



More information about the AstroPy mailing list