[AstroPy] saving tables as VOtables with missing values

Susana Sanchez Exposito sse at iaa.es
Thu Oct 16 05:39:00 EDT 2014


2014-10-15 18:37 GMT+02:00 Michael Droettboom <mdroe at stsci.edu>:

>  On 10/15/2014 09:20 AM, Susana Sanchez Exposito wrote:
>
>    Thanks for your answer Tom!
>
>  I have tested what you proposed, and I found two things:
>
>  1) If you replace the empty string by a value (e.g. np.nan) in order to
> get a mask array, this value will remain in the resulting VOTable, and in
> some cases it could be strange. I illustrate this with this example (note
> the first column of the table is the object name):
>
> list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3", 2, 3,2.6,
> float('nan')], ]
> to_nan=[]
> for r in list:
>     to_nan.append([numpy.nan if (str(x).upper()=="NAN" or x=="") else x
> for x in r])
>
> tab=astropy.table.Table(rows=to_nan, names=["A", "B", "C", "D", "E"],
> masked=True)
> votable=astropy.io.votable.from_table(tab)
> votable.to_xml("/home/sse/Desktop/test.xml")
>
> If you open the resulting VOtable (attached as test.xml) in Topcat you
> will see the 'nan' string in the object name column, which is very strange.
>
>   The loop you provide above is putting the string “nan”, not the
> floating-point value NaN in the column — since the column is a string
> column, there’s really nothing else it can do. Signs point to using your
> approach below, with an explicit mask, for this reason.
>
mmm, I am pretty sure the loop above is putting the float value NaN. This
ipython output shows that:


In [2]: list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3", 2,
3,2.6,  float('nan')], ]
In [3]: to_nan=[]
In [4]: for r in list:
   ...:         to_nan.append([numpy.nan if (str(x).upper()=="NAN" or
x=="") else x for x in r])
   ...:
In [5]: print to_nan
[['CIG1', 1.2, 3, nan, 3], [nan, 3, 2.0, 5, 2], ['CIG3', 2, 3, 2.6, nan]]

Or maybe you mean other thing ?



>
>
>  Ok. I can build the mask array in other ways, BUT:
>
>  2) Even providing a mask, the column types are misinterpret.  The
> following example illustrates what I mean:
>
>
> list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3", 2, 3,2.6,
> float('nan')] ]
>
> mask_empty=[[False, False, False, True, False], [True, False, False,
> False, False], [False, False, False,False,  True] ]
> tab=astropy.table.Table(rows=list, names=["A", "B", "C", "D", "E"],
> masked=True)
> tab.mask= numpy.array (mask_empty)
> votable=astropy.io.votable.from_table(tab)
> votable.to_xml("/home/sse/Desktop/test.xml")
>
>  If you open the resulting VOtable (attached as test2.xml) you will see
> that the 4th column is typed as char (<FIELD ID="D" arraysize="3"
> datatype="char" name="D"/>), when the only string in the column is masked!
>
>  Maybe I am doing something wrong?
>
>   The determination of column types is independent of masking,
> unfortunately, and happens on the line tab=astropy.table.Table(... above
> before the table is even made aware of the mask. In general, the masked
> values need to be of the column type. But since it is masked, it shouldn’t
> matter what that actual value is.
>
Ok.This is what I was imaging.

Thanks for your answer, Mike!


> Mike
>
>
>
>
> 2014-10-15 11:42 GMT+02:00 Thomas Robitaille <thomas.robitaille at gmail.com>
> :
>
>> Hi Susana,
>>
>> Just to simplify your current workflow, what you are doing is equivalent
>> to:
>>
>> tab = Table(rows=list, names=...)
>> tab.write('test.xml', format='votable')
>>
>> so no need for the zip(*) call and the call to votable.from_table.
>>
>> Now in terms of the masked values, I think the easiest is actually to
>> just give some integer value to the missing value and then set the mask,
>> so:
>>
>> In [19]: t = Table(rows=list,
>>                    names=["A", "B", "C", "D", "E"],
>>                    masked=True)
>>
>> In [20]: t['D'].mask[0] = True
>>
>> In [21]: print(t)
>>  A    B   C   D   E
>> ---- --- --- --- ---
>> CIG1 1.2 3.0  -- 3.0
>> CIG1 3.0 2.0 5.0 2.0
>> CIG3 2.0 3.0 2.6 nan
>>
>> and then write out to votable with:
>>
>> t.write('test.xml', format='votable')
>>
>> which should preserve the masks.
>>
>> If you chose to use a special value (e.g. np.nan) to indicate masked
>> values, then you can do:
>>
>> t['D'].mask = np.isnan(t['D'].mask)
>>
>> Let me know if any of the above isn't clear, or if it doesn't solve your
>> issue!
>>
>> Cheers,
>> Tom
>>
>> Susana Sanchez Exposito wrote:
>> >
>> > Hi all,
>> >
>> > I work on an interface where the user can view and edit tables and save
>> > them as VOtables, and for that I use the Astropy library.
>> >
>> > I keep the data of the tables in python lists, so to save them as
>> > VOtable I do this:
>> >
>> > list = [["CIG1", 1.2, 3, "", 3], ["CIG1", 3, 2.0, 5, 2], ["CIG3", 2,
>> > 3,2.6,  float('nan')]]
>> >
>> > #transform the list of rows into list of columns
>> > list_cols= zip(*list)
>> >
>> > tab=astropy.table.Table(list_cols, names=["A", "B", "C", "D", "E"])
>> > votable=astropy.io.votable.from_table(tab)
>> > votable.to_xml("/home/sse/Desktop/test.xml")
>> >
>> > The table.Table method interprets correctly  the type of each column
>> > except for the 3rd column. This column contains a "missing value" or
>> > maybe a value deleted by the user, so the whole columns is marked as
>> > "string" type, when actually is float type.
>> >
>> > I could transform all empty strings into NaN but this will be strange
>> > for those columns containing strings .
>> >
>> > I have tried to transform the python list into a masked array, but
>> > without success: problems to mask empty strings.
>> >
>> > Maybe I should to find out the type of each columns, going over the
>> > table and calculating the type of the majority of the column item, and
>> > then pass this type array to the table.Table method in some way ??
>> >
>> > So before, to continue investigating, I would like to ask you for some
>> > tips to solve the missing values problems with astropy, or maybe even if
>> > there is a specific method for that.
>> >
>> > Thanks in advanced.
>> >
>> > Susana.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Susana Sánchez Expósito
>> >
>> > Instituto de Astrofísica de Andalucía   IAA (CSIC)
>> > Camino Bajo de Huétor, 50. Granada E-18008
>> > Tel:(+34) 958 121 311 / (+34) 958 230 618
>> > Fax:(+34) 958 814 530
>>  > e-mail: sse at iaa.es <mailto:sse at iaa.es>
>> >
>> > _______________________________________________
>> > AstroPy mailing list
>> > AstroPy at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/astropy
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org
>> http://mail.scipy.org/mailman/listinfo/astropy
>>
>
>
>
> --
> Susana Sánchez Expósito
>
> Instituto de Astrofísica de Andalucía   IAA (CSIC)
> Camino Bajo de Huétor, 50. Granada E-18008
> Tel:(+34) 958 121 311 / (+34) 958 230 618
> Fax:(+34) 958 814 530
> e-mail: sse at iaa.es
>
>
> _______________________________________________
> AstroPy mailing listAstroPy at scipy.orghttp://mail.scipy.org/mailman/listinfo/astropy
>
>>
> --
> Michael Droettboom
> Science Software Branch
> Space Telescope Science Institute
> http://www.droettboom.com
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
>
>


-- 
Susana Sánchez Expósito

Instituto de Astrofísica de Andalucía   IAA (CSIC)
Camino Bajo de Huétor, 50. Granada E-18008
Tel:(+34) 958 121 311 / (+34) 958 230 618
Fax:(+34) 958 814 530
e-mail: sse at iaa.es
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20141016/79cc5412/attachment.html>


More information about the AstroPy mailing list