[Numpy-discussion] Bug in rec.fromarrays ; plus one other possible bug

Dan Yamins dyamins at gmail.com
Wed Nov 25 09:48:50 EST 2009

Hi, I'm writing to report what looks like a two bugs in the handling of
strings of length 0.  (I'm using 1.4.0.dev7746, on Mac OSX 10.5.8.   The
problems below occur both for python 2.5 compiled 32-bit as well as
python2.6 compiled 64-bit).

Bug #1:
A problem arises when you try to create a record array passing a type of

>>> Cols = [['test']*10,['']*10]

When not passing any dtype, this is created into a recarray with no problem:

>>> np.rec.fromarrays(Cols)
rec.array([('test', ''), ('test', ''), ('test', ''), ('test', ''),
       ('test', ''), ('test', ''), ('test', ''), ('test', ''),
       ('test', ''), ('test', '')],
      dtype=[('f0', '|S4'), ('f1', '|S1')])

However, trouble arises when I try to pass a length-0 dtype explicitly.

>>> d = np.dtype([('A', '|S4'), ('B', '|S')])
>>> np.rec.fromarrays(Cols,dtype=d)
rec.array([('test', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
       ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
       ('\x00est', ''), ('\x00est', '')],
      dtype=[('A', '|S4'), ('B', '|S0')])

The same thing occurs if I cast to np arrays before passing to

>>> _arrays = [np.array(Cols[0],'|S4'),np.array(Cols[1],'|S')]
[array(['test', 'test', 'test', 'test', 'test', 'test', 'test', 'test',
       'test', 'test'],
 array(['', '', '', '', '', '', '', '', '', ''],

>>> np.rec.fromarrays(_arrays,dtype=d)
rec.array([('test', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
       ('\x00est', ''), ('\x00est', ''), ('\x00est', ''), ('\x00est', ''),
       ('\x00est', ''), ('\x00est', '')],
      dtype=[('A', '|S4'), ('B', '|S0')])

(Btw, why does np.array(['',''],'|S')) return an array with dtype '|S1'?
Why not '|S0'?  Are length-0 arrays being avoided explicitly? If so, why?)

Bug #2:  I'm not sure this is a bug, but it is annoying: np.dtype won't
accept '|S0' as a type argument.

>>> np.dtype('|S0')
TypeError: data type not understood

I have to do something like this:

>>> d = np.dtype('|S')
>>> d

to get what I want.   Is this intended?  Regardless, this inconsistency also
means that things like:

>>> np.dtype(d.descr)

can fail even when d is a properly constructed dtype object with a '|S0'
type, which seems a little perverse.

Am I just not supposed to be working with length-0 string columns, period?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091125/61649ad8/attachment.html>

More information about the NumPy-Discussion mailing list