[Numpy-discussion] [SciPy-dev] Deprecate chararray [was Plea for help]

Michael Droettboom mdroe at stsci.edu
Tue Sep 29 13:55:24 EDT 2009


I now have a rather large patch ready which addresses the following 
issues with chararrays.  Would it be possible to get SVN commit 
priviledges, or would you prefer a patch file?

1) Fix bugs in Trac

http://projects.scipy.org/numpy/ticket/1199 (chararray.expandtabs broken)
http://projects.scipy.org/numpy/ticket/856 (chararray __mod__ error)
http://projects.scipy.org/numpy/ticket/855 (chararray __mul__ error)
http://projects.scipy.org/numpy/ticket/1231 (chararray methods ignore 
all arguments following the first argument that evaluates to False)
http://projects.scipy.org/numpy/ticket/1235 (Coercing object arrays to 
string arrays has surprising behaviour)
http://projects.scipy.org/numpy/ticket/1240 (Casting from Unicode to 
String array ignores exception)
http://projects.scipy.org/numpy/ticket/1241 (Array constructed with 
mixture of str and unicode objects fails length detection)

I can provide small individual patches for some of these if necessary, 
but some are interrelated and can only be fixed by the "whole enchilada".

2) Improve documentation

Every method now has a docstring, and a new page of routines has been 
added to the Sphinx tree.

3) Improve unit test coverage

Full line-by-line coverage of defchararray.py, as well as lots of hairy 
Unicode side cases.

4a) Create C-based vectorized string operations

This is benchmarking about 5x faster than the old Python-based looping 
on a large database of around 20k astronomical objects

4b) Refactor chararray class in terms of those

4c) Design and create an interface to those methods that will be the
"right way" going forward

All vectorized string operations are now available as regular functions 
in the numpy.char namespace.  Usage of the chararray view class is only 
recommended for numarray backward compatibility.

A few side notes:

http://projects.scipy.org/numpy/ticket/1200 (chararray.rstrip inconsistency)

This bug I believe should be marked as "won't fix".  The inconsistent 
handling of trailing whitespace inconsistency is an unfortunate 
"feature" of the chararray class, and I am wary that fixing it may break 
backward compatibility.  However, the new free functions in numpy.char 
do not have this inconsistency, so they should be recommended for new code.

http://projects.scipy.org/numpy/ticket/1240 (Casting from Unicode to 
String array ignores exception)

This bug probably needs review by someone deeply familiar with the 
low-level internals, as it affects more than just string and unicode 
arrays.  It doesn't break any of the unit tests, for what it's worth ;)

Cheers,
Mike

David Goldsmith wrote:
> Great, thanks!
>
> DG
>
> On Fri, Sep 25, 2009 at 6:07 AM, Michael Droettboom <mdroe at stsci.edu 
> <mailto:mdroe at stsci.edu>> wrote:
>
>     David Goldsmith wrote:
>     > On Tue, Sep 22, 2009 at 4:02 PM, Ralf Gommers
>     > <ralf.gommers at googlemail.com
>     <mailto:ralf.gommers at googlemail.com>
>     <mailto:ralf.gommers at googlemail.com
>     <mailto:ralf.gommers at googlemail.com>>> wrote:
>     >
>     >
>     >     On Tue, Sep 22, 2009 at 1:58 PM, Michael Droettboom
>     >     <mdroe at stsci.edu <mailto:mdroe at stsci.edu>
>     <mailto:mdroe at stsci.edu <mailto:mdroe at stsci.edu>>> wrote:
>     >
>     >         Trac has these bugs.  Any others?
>     >
>     >         http://projects.scipy.org/numpy/ticket/1199
>     >         http://projects.scipy.org/numpy/ticket/1200
>     >         http://projects.scipy.org/numpy/ticket/856
>     >         http://projects.scipy.org/numpy/ticket/855
>     >         http://projects.scipy.org/numpy/ticket/1231
>     >
>     >
>     >     This one:
>     >    
>     http://article.gmane.org/gmane.comp.python.numeric.general/23638/match=chararray
>     >
>     >     Cheers,
>     >     Ralf
>     >
>     >
>     > That last one never got "promoted" to a ticket?
>     It's a symptom of this bug, that I created and produced a patch for
>     yesterday:
>
>     http://projects.scipy.org/numpy/ticket/1235
>
>     Mike
>
>
>     --
>     Michael Droettboom
>     Science Software Branch
>     Operations and Engineering Division
>     Space Telescope Science Institute
>     Operated by AURA for NASA
>
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   

-- 
Michael Droettboom
Science Software Branch
Operations and Engineering Division
Space Telescope Science Institute
Operated by AURA for NASA




More information about the NumPy-Discussion mailing list