[AstroPy] saving tables as VOtables with missing values

Michael Droettboom mdroe at stsci.edu
Wed Oct 15 12:37:40 EDT 2014


On 10/15/2014 09:20 AM, Susana Sanchez Exposito wrote:

> Thanks for your answer Tom!
>
> I have tested what you proposed, and I found two things:
>
> 1) If you replace the empty string by a value (e.g. np.nan) in order 
> to get a mask array, this value will remain in the resulting VOTable, 
> and in some cases it could be strange. I illustrate this with this 
> example (note the first column of the table is the object name):
>
> list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3", 2, 
> 3,2.6,  float('nan')], ]
> to_nan=[]
> for r in list:
>     to_nan.append([numpy.nan if (str(x).upper()=="NAN" or x=="") else 
> x for x in r])
>
> tab=astropy.table.Table(rows=to_nan, names=["A", "B", "C", "D", "E"],  
> masked=True)
> votable=astropy.io.votable.from_table(tab)
> votable.to_xml("/home/sse/Desktop/test.xml")
>
> If you open the resulting VOtable (attached as test.xml) in Topcat you 
> will see the 'nan' string in the object name column, which is very 
> strange.

The loop you provide above is putting the string “nan”, not the 
floating-point value NaN in the column — since the column is a string 
column, there’s really nothing else it can do. Signs point to using your 
approach below, with an explicit mask, for this reason.

>
>
> Ok. I can build the mask array in other ways, BUT:
>
> 2) Even providing a mask, the column types are misinterpret.  The 
> following example illustrates what I mean:
>
>
> list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3", 2, 
> 3,2.6,  float('nan')] ]
>
> mask_empty=[[False, False, False, True, False], [True, False, False, 
> False, False], [False, False, False,False,  True] ]
> tab=astropy.table.Table(rows=list, names=["A", "B", "C", "D", "E"],  
> masked=True)
> tab.mask= numpy.array (mask_empty)
> votable=astropy.io.votable.from_table(tab)
> votable.to_xml("/home/sse/Desktop/test.xml")
>
> If you open the resulting VOtable (attached as test2.xml) you will see 
> that the 4th column is typed as char (<FIELD ID="D" arraysize="3" 
> datatype="char" name="D"/>), when the only string in the column is masked!
>
> Maybe I am doing something wrong?

The determination of column types is independent of masking, 
unfortunately, and happens on the line |tab=astropy.table.Table(...| 
above before the table is even made aware of the mask. In general, the 
masked values need to be of the column type. But since it is masked, it 
shouldn’t matter what that actual value is.

Mike

>
>
>
> 2014-10-15 11:42 GMT+02:00 Thomas Robitaille 
> <thomas.robitaille at gmail.com <mailto:thomas.robitaille at gmail.com>>:
>
>     Hi Susana,
>
>     Just to simplify your current workflow, what you are doing is
>     equivalent to:
>
>     tab = Table(rows=list, names=...)
>     tab.write('test.xml', format='votable')
>
>     so no need for the zip(*) call and the call to votable.from_table.
>
>     Now in terms of the masked values, I think the easiest is actually to
>     just give some integer value to the missing value and then set the
>     mask, so:
>
>     In [19]: t = Table(rows=list,
>                        names=["A", "B", "C", "D", "E"],
>                        masked=True)
>
>     In [20]: t['D'].mask[0] = True
>
>     In [21]: print(t)
>      A    B   C   D   E
>     ---- --- --- --- ---
>     CIG1 1.2 3.0  -- 3.0
>     CIG1 3.0 2.0 5.0 2.0
>     CIG3 2.0 3.0 2.6 nan
>
>     and then write out to votable with:
>
>     t.write('test.xml', format='votable')
>
>     which should preserve the masks.
>
>     If you chose to use a special value (e.g. np.nan) to indicate masked
>     values, then you can do:
>
>     t['D'].mask = np.isnan(t['D'].mask)
>
>     Let me know if any of the above isn't clear, or if it doesn't
>     solve your
>     issue!
>
>     Cheers,
>     Tom
>
>     Susana Sanchez Exposito wrote:
>     >
>     > Hi all,
>     >
>     > I work on an interface where the user can view and edit tables
>     and save
>     > them as VOtables, and for that I use the Astropy library.
>     >
>     > I keep the data of the tables in python lists, so to save them as
>     > VOtable I do this:
>     >
>     > list = [["CIG1", 1.2, 3, "", 3], ["CIG1", 3, 2.0, 5, 2], ["CIG3", 2,
>     > 3,2.6,  float('nan')]]
>     >
>     > #transform the list of rows into list of columns
>     > list_cols= zip(*list)
>     >
>     > tab=astropy.table.Table(list_cols, names=["A", "B", "C", "D", "E"])
>     > votable=astropy.io.votable.from_table(tab)
>     > votable.to_xml("/home/sse/Desktop/test.xml")
>     >
>     > The table.Table method interprets correctly the type of each column
>     > except for the 3rd column. This column contains a "missing value" or
>     > maybe a value deleted by the user, so the whole columns is marked as
>     > "string" type, when actually is float type.
>     >
>     > I could transform all empty strings into NaN but this will be
>     strange
>     > for those columns containing strings .
>     >
>     > I have tried to transform the python list into a masked array, but
>     > without success: problems to mask empty strings.
>     >
>     > Maybe I should to find out the type of each columns, going over the
>     > table and calculating the type of the majority of the column
>     item, and
>     > then pass this type array to the table.Table method in some way ??
>     >
>     > So before, to continue investigating, I would like to ask you
>     for some
>     > tips to solve the missing values problems with astropy, or maybe
>     even if
>     > there is a specific method for that.
>     >
>     > Thanks in advanced.
>     >
>     > Susana.
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     > --
>     > Susana Sánchez Expósito
>     >
>     > Instituto de Astrofísica de Andalucía   IAA (CSIC)
>     > Camino Bajo de Huétor, 50. Granada E-18008
>     > Tel:(+34) 958 121 311 / (+34) 958 230 618
>     > Fax:(+34) 958 814 530
>     > e-mail: sse at iaa.es <mailto:sse at iaa.es> <mailto:sse at iaa.es
>     <mailto:sse at iaa.es>>
>     >
>     > _______________________________________________
>     > AstroPy mailing list
>     > AstroPy at scipy.org <mailto:AstroPy at scipy.org>
>     > http://mail.scipy.org/mailman/listinfo/astropy
>     _______________________________________________
>     AstroPy mailing list
>     AstroPy at scipy.org <mailto:AstroPy at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/astropy
>
>
>
>
> -- 
> Susana Sánchez Expósito
>
> Instituto de Astrofísica de Andalucía   IAA (CSIC)
> Camino Bajo de Huétor, 50. Granada E-18008
> Tel:(+34) 958 121 311 / (+34) 958 230 618
> Fax:(+34) 958 814 530
> e-mail: sse at iaa.es <mailto:sse at iaa.es>
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy

​

-- 
Michael Droettboom
Science Software Branch
Space Telescope Science Institute

http://www.droettboom.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20141015/25f0cdc8/attachment.html>


More information about the AstroPy mailing list