[AstroPy] saving tables as VOtables with missing values

Susana Sanchez Exposito sse at iaa.es
Wed Oct 15 09:20:03 EDT 2014


Thanks for your answer Tom!

I have tested what you proposed, and I found two things:

1) If you replace the empty string by a value (e.g. np.nan) in order to get
a mask array, this value will remain in the resulting VOTable, and in some
cases it could be strange. I illustrate this with this example (note the
first column of the table is the object name):

list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3", 2, 3,2.6,
float('nan')], ]
to_nan=[]
for r in list:
    to_nan.append([numpy.nan if (str(x).upper()=="NAN" or x=="") else x for
x in r])

tab=astropy.table.Table(rows=to_nan, names=["A", "B", "C", "D", "E"],
masked=True)
votable=astropy.io.votable.from_table(tab)
votable.to_xml("/home/sse/Desktop/test.xml")

If you open the resulting VOtable (attached as test.xml) in Topcat you will
see the 'nan' string in the object name column, which is very strange.


Ok. I can build the mask array in other ways, BUT:

2) Even providing a mask, the column types are misinterpret.  The following
example illustrates what I mean:


list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3", 2, 3,2.6,
float('nan')] ]

mask_empty=[[False, False, False, True, False], [True, False, False, False,
False], [False, False, False,False,  True] ]
tab=astropy.table.Table(rows=list, names=["A", "B", "C", "D", "E"],
masked=True)
tab.mask= numpy.array (mask_empty)
votable=astropy.io.votable.from_table(tab)
votable.to_xml("/home/sse/Desktop/test.xml")

If you open the resulting VOtable (attached as test2.xml) you will see that
the 4th column is typed as char (<FIELD ID="D" arraysize="3"
datatype="char" name="D"/>), when the only string in the column is masked!

Maybe I am doing something wrong?



2014-10-15 11:42 GMT+02:00 Thomas Robitaille <thomas.robitaille at gmail.com>:

> Hi Susana,
>
> Just to simplify your current workflow, what you are doing is equivalent
> to:
>
> tab = Table(rows=list, names=...)
> tab.write('test.xml', format='votable')
>
> so no need for the zip(*) call and the call to votable.from_table.
>
> Now in terms of the masked values, I think the easiest is actually to
> just give some integer value to the missing value and then set the mask,
> so:
>
> In [19]: t = Table(rows=list,
>                    names=["A", "B", "C", "D", "E"],
>                    masked=True)
>
> In [20]: t['D'].mask[0] = True
>
> In [21]: print(t)
>  A    B   C   D   E
> ---- --- --- --- ---
> CIG1 1.2 3.0  -- 3.0
> CIG1 3.0 2.0 5.0 2.0
> CIG3 2.0 3.0 2.6 nan
>
> and then write out to votable with:
>
> t.write('test.xml', format='votable')
>
> which should preserve the masks.
>
> If you chose to use a special value (e.g. np.nan) to indicate masked
> values, then you can do:
>
> t['D'].mask = np.isnan(t['D'].mask)
>
> Let me know if any of the above isn't clear, or if it doesn't solve your
> issue!
>
> Cheers,
> Tom
>
> Susana Sanchez Exposito wrote:
> >
> > Hi all,
> >
> > I work on an interface where the user can view and edit tables and save
> > them as VOtables, and for that I use the Astropy library.
> >
> > I keep the data of the tables in python lists, so to save them as
> > VOtable I do this:
> >
> > list = [["CIG1", 1.2, 3, "", 3], ["CIG1", 3, 2.0, 5, 2], ["CIG3", 2,
> > 3,2.6,  float('nan')]]
> >
> > #transform the list of rows into list of columns
> > list_cols= zip(*list)
> >
> > tab=astropy.table.Table(list_cols, names=["A", "B", "C", "D", "E"])
> > votable=astropy.io.votable.from_table(tab)
> > votable.to_xml("/home/sse/Desktop/test.xml")
> >
> > The table.Table method interprets correctly  the type of each column
> > except for the 3rd column. This column contains a "missing value" or
> > maybe a value deleted by the user, so the whole columns is marked as
> > "string" type, when actually is float type.
> >
> > I could transform all empty strings into NaN but this will be strange
> > for those columns containing strings .
> >
> > I have tried to transform the python list into a masked array, but
> > without success: problems to mask empty strings.
> >
> > Maybe I should to find out the type of each columns, going over the
> > table and calculating the type of the majority of the column item, and
> > then pass this type array to the table.Table method in some way ??
> >
> > So before, to continue investigating, I would like to ask you for some
> > tips to solve the missing values problems with astropy, or maybe even if
> > there is a specific method for that.
> >
> > Thanks in advanced.
> >
> > Susana.
> >
> >
> >
> >
> >
> >
> >
> > --
> > Susana Sánchez Expósito
> >
> > Instituto de Astrofísica de Andalucía   IAA (CSIC)
> > Camino Bajo de Huétor, 50. Granada E-18008
> > Tel:(+34) 958 121 311 / (+34) 958 230 618
> > Fax:(+34) 958 814 530
> > e-mail: sse at iaa.es <mailto:sse at iaa.es>
> >
> > _______________________________________________
> > AstroPy mailing list
> > AstroPy at scipy.org
> > http://mail.scipy.org/mailman/listinfo/astropy
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
>



-- 
Susana Sánchez Expósito

Instituto de Astrofísica de Andalucía   IAA (CSIC)
Camino Bajo de Huétor, 50. Granada E-18008
Tel:(+34) 958 121 311 / (+34) 958 230 618
Fax:(+34) 958 814 530
e-mail: sse at iaa.es
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20141015/95006068/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.xml
Type: text/xml
Size: 1033 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/astropy/attachments/20141015/95006068/attachment-0002.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test2.xml
Type: text/xml
Size: 1016 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/astropy/attachments/20141015/95006068/attachment-0003.xml>


More information about the AstroPy mailing list