[Numpy-discussion] Massive differences in numpy vs. numeric string handling

Wed Apr 12 14:30:05 EDT 2006

In Numeric:

Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,)
Numeric.array(['test','two']) ->
array([[t, e, s, t],
        [t, w, o,  ]],'c')

but in numpy:

numpy.array('test') -> array('test', dtype='|S4'); shape = ()
numpy.array('test','S1') -> array('t', dtype='|S1'); shape = ()

in fact you have to do an extra list cast:

numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1');  
shape = (4,)

to get the desired result.  I don't think this is very pythonic, as  
strings are fully indexable and iterable objects.  Furthermore,  
converting/treating a string as an array of characters is a very  
common thing.  convertcode.py would not appear to convert this part  
of the code correctly either.  Also, the use of quotes in the shape  
() array but not in the shape (4,) array is inconsistent.

I realize the ability to use strings of arbitrary length as array  
elements is important in numpy, but there really should be a more  
natural option to convert/cast strings as character arrays.

Also, unlike Numeric.equal and 'c' arrays, numpy.equal cannot compare  
'|S1' arrays or presumably other strings for equality, although this  
is a very useful comparison to make.

For the record, I have used the Numeric (and to a lesser degree the  
numarray) module extensively in bioinformatics applications for its  
speed and brevity.

Jeremy