Mailman 3 chararray stripping trailing whitespace a bug? - NumPy-Discussion

chararray stripping trailing whitespace a bug?

Neil Crighton

10 May 2010 10 May '10

8:53 p.m.

I've been working with pyfits, which uses numpy chararrays. I've discovered the hard way that chararrays silently remove trailing whitespace:

...

...
...
a = np.array(['a ']) b = a.view(np.chararray) a[0] 'a ' b[0] 'a'

Note the string values stored in memory are unchanged. This behaviour caused a bug in a program I've been writing, and seems like a bad idea in general. Is it intentional? Neil

Show replies by date

From the chararray docstring: Versus a regular Numpy array of type `str` or `unicode`, this class adds the following functionality: 1) values automatically have whitespace removed from the end when indexed So I guess it is a feature, not a bug. :) Warren Neil Crighton wrote:

...

I've been working with pyfits, which uses numpy chararrays. I've discovered the hard way that chararrays silently remove trailing whitespace:

...
...
...
a = np.array(['a ']) b = a.view(np.chararray) a[0]

'a '

...
...
...
b[0]

'a'

Note the string values stored in memory are unchanged. This behaviour caused a bug in a program I've been writing, and seems like a bad idea in general. Is it intentional?

Neil

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Christopher Hanley

9:10 p.m.

New subject: chararray stripping trailing whitespace a bug?

On Mon, May 10, 2010 at 11:23 AM, Neil Crighton wrote:

...

I've been working with pyfits, which uses numpy chararrays. I've discovered the hard way that chararrays silently remove trailing whitespace:

...
...
...
a = np.array(['a ']) b = a.view(np.chararray) a[0] 'a ' b[0] 'a'

Note the string values stored in memory are unchanged. This behaviour caused a bug in a program I've been writing, and seems like a bad idea in general. Is it intentional?

Neil

This is an intentional "feature", not a bug. Chris -- Christopher Hanley Senior Systems Software Engineer Space Telescope Science Institute 3700 San Martin Drive Baltimore MD, 21218 (410) 338-4338

Neil Crighton

10:50 p.m.

New subject: chararray stripping trailing whitespace a bug?

...

This is an intentional "feature", not a bug.

Chris

Ah, ok, thanks. I missed the explanation in the doc string because I'm using version 1.3 and forgot to check the web docs. For the record, this was my bug: I read a fits binary table with pyfits. One of the table fields was a chararray containing a bunch of flags ('A','B','C','D'). I tried to use in1d() to identify all entries with flags of 'C' or 'D'. So

...

...
...
c = pyfits_table.chararray_column mask = np.in1d(c, ['C', 'D'])

It turns out the actual stored values in the chararray were 'A ', 'B ', 'C ' and 'D '. in1d() converts the chararray to an ndarray before performing the comparison, so none of the entries matches 'C' or 'D'. What is the best way to ensure this doesn't happen to other people? We could change the array set operations to special-case chararrays, but this seems like an ugly solution. Is it possible to change something in pyfits to avoid this? Neil

Michael Droettboom

11 May 11 May

2:21 a.m.

Also from the docstring: """ .. note:: The `chararray` class exists for backwards compatibility with Numarray, it is not recommended for new development. Starting from numpy 1.4, if one needs arrays of strings, it is recommended to use arrays of `dtype` `object_`, `string_` or `unicode_`, and use the free functions in the `numpy.char` module for fast vectorized string operations. """ Neil Crighton wrote:

...

Ah, ok, thanks. I missed the explanation in the doc string because I'm using version 1.3 and forgot to check the web docs.

For the record, this was my bug: I read a fits binary table with pyfits. One of the table fields was a chararray containing a bunch of flags ('A','B','C','D'). I tried to use in1d() to identify all entries with flags of 'C' or 'D'. So

...
...
...
c = pyfits_table.chararray_column mask = np.in1d(c, ['C', 'D'])

It turns out the actual stored values in the chararray were 'A ', 'B ', 'C ' and 'D '. in1d() converts the chararray to an ndarray before performing the comparison, so none of the entries matches 'C' or 'D'.

This inconsistency is fixed in Numpy 1.4 (which included a major overhaul of chararrays). in1d will perform the auto whitespace-stripping on chararrays, but not on regular ndarrays of strings.

...

What is the best way to ensure this doesn't happen to other people? We could change the array set operations to special-case chararrays, but this seems like an ugly solution. Is it possible to change something in pyfits to avoid this?

Pyfits continues to use chararray since not doing so would break existing code relying on this behavior. And there are many use cases where this behavior is desirable, particularly with fixed-length strings in tables. The best way to get around it from your code is to cast the chararray pyfits returns to a regular ndarray. The cast does not perform a copy, so should be very efficient: In [6]: from numpy import char In [7]: import numpy as np In [8]: c = char.array(['a ', 'b ']) In [9]: c Out[9]: chararray(['a', 'b'], dtype='|S2') In [10]: np.asarray(c) Out[11]: array(['a ', 'b '], dtype='|S2') I suggest casting between to either chararray or ndarray depending on whether you want the auto-whitespace-stripping behavior. Mike -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA

Neil Crighton

3:44 a.m.

New subject: chararray stripping trailing whitespace a bug?

...

This inconsistency is fixed in Numpy 1.4 (which included a major overhaul of chararrays). in1d will perform the auto whitespace-stripping on chararrays, but not on regular ndarrays of strings.

Great, thanks.

...

Pyfits continues to use chararray since not doing so would break existing code relying on this behavior. And there are many use cases where this behavior is desirable, particularly with fixed-length strings in tables.

The best way to get around it from your code is to cast the chararray pyfits returns to a regular ndarray.

My problem was I didn't know I needed to get around it :) But thanks for the suggestion, I'll use that in future when I need to switch between chararrays and ndarrays. Neil

5093

Age (days ago)

5093

Last active (days ago)

List overview

Download

5 comments

4 participants

participants (4)

Christopher Hanley
Michael Droettboom
Neil Crighton
Warren Weckesser

chararray stripping trailing whitespace a bug?

Neil Crighton

Warren Weckesser

Christopher Hanley

Neil Crighton

Michael Droettboom

Neil Crighton

tags

participants (4)