[AstroPy] table string length truncated after reading table

Aldcroft, Thomas aldcroft at head.cfa.harvard.edu
Fri Aug 16 08:05:14 EDT 2013


(Sent this last night but accidentally just to Josh).  - Tom


Hi Josh,

The problem is that when you read the file back with ascii.read() it
has no memory of the original 'S50' data type, so it just makes the
string type as long as the longest one in the ASCII table file.  Then
when you add a row it is not able to fit the new longer string into
the existing column so it just truncates.  (This is the behavior of
the underlying numpy array which is used by Table).

Fortunately there is a way to force the data type when reading with
ascii.read() with the converters argument.  In your example below you
would do the read with:

>>> secondTable = ascii.read("mytable.txt", converters={'col1': [ascii.convert_numpy('S50')], 'col2': [ascii.convert_numpy('S50')]})

The idea here is that you are forcing the column to be converted from
a Python list to a numpy array with a dtype of 'S50' instead of the
normal default of guessing float, int, str.

See http://astropy.readthedocs.org/en/latest/io/ascii/read.html#converters
for the docs.

- Tom

On Thu, Aug 15, 2013 at 10:51 PM, Matthew Craig <mcraig at mnstate.edu> wrote:
> I ran into this once upon a time too (in atPy).
>
> The type for the columns when you read from a table is being guessed at by
> ascii; in the absence of any other guidance it assumes the dtype of each
> column is the length of the longest string it finds in the column.
>
> Doesn't look like there is a way to specify type in ascii.read (though I
> just did a quick skim of the docs), but this would work:
>
> ```
> In [35]: thirdTable = table.Table(np.array(secondTable), names=('col1',
> 'col2'), dtypes=('S50', 'S50'))
>
> In [36]: thirdTable.add_row(('abcdefghijklmnopqrst', 'longer_string'))
>
> In [37]: print thirdTable
>         col1              col2
> -------------------- -------------
>          abcdefghijk  short_string
>     abcdefghijklmnop   long_string
> abcdefghijklmnopqrst longer_string
> ```
>
> Matt Craig
> PS Would have been happy to answer on astrobabel but waiting for my
> membership to be approved :)
>
>
> Office hours/schedule at: http://physics.mnstate.edu/craig
> ----
> Professor
> Department of Physics and Astronomy
> Minnesota State University Moorhead
> 1104 7th Ave S, Moorhead MN 56563
>
> phone: (218) 477-2439
> fax: (218) 477-2290
>
> On Aug 15, 2013, at 8:45 PM, Josh Walawender <jmwalawender at gmail.com> wrote:
>
> Hi all,
>
> I'm having a problem working with astropy.table and astropy.io.ascii and I
> can't tell if the behavior I'm encountering is a feature or a bug. I'm
> hoping someone can guide me to a good solution. Here's the situation:
>
> I have code which loops though a series of input data files, does analysis,
> stores the results in an astropy.table, and writes the table to a text file
> using astropy.io.ascii. One of the fields in the row is the input filename.
> As the code loops through the input files, it reads the previous table as
> output by io.ascii, appends a row to the table object, then overwrites the
> old file with a new one based on the new table which contains the new row.
>
> The symptom is that the all subsequent times through, the length of all
> strings written to the file name field are now truncated to whatever length
> the first file name was.
>
> Here's a quick test case (copied and pasted from iPython) demonstrating the
> problem:
>
> ```
> In [1]: import astropy.table as table
>
> In [2]: import astropy.io.ascii as ascii
>
> In [4]: firstTable = table.Table(names=('col1', 'col2'), dtypes=('S50',
> 'S50'))
>
> In [5]: firstTable.add_row(('abcdefghijk', 'short_string'))
>
> In [6]: firstTable.add_row(('abcdefghijklmnop', 'long_string'))
>
> In [7]: print(firstTable)
>       col1           col2
> ---------------- ------------
>      abcdefghijk short_string
> abcdefghijklmnop  long_string
>
> In [8]: ascii.write(firstTable, "mytable.txt")
>
> In [9]: secondTable = ascii.read("mytable.txt")
>
> In [10]: print(secondTable)
>       col1           col2
> ---------------- ------------
>      abcdefghijk short_string
> abcdefghijklmnop  long_string
>
> In [11]: secondTable.add_row(('abcdefghijklmnopqrst', 'longer_string'))
>
> In [12]: print(secondTable)
>       col1           col2
> ---------------- ------------
>      abcdefghijk short_string
> abcdefghijklmnop  long_string
> abcdefghijklmnop longer_strin
> ```
>
> Any suggestions on how to avoid this behavior?
>
> thanks!
> Josh
>
> P.S.  Based on the recent discussion on astropy-dev about where to get help,
> I've also posted this on astrobabel:
> http://www.astrobabel.com/v/discussion/77/astropy-question-table-string-length-truncated-after-reading-table#Item_1
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
>
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
>



More information about the AstroPy mailing list