[Numpy-discussion] Empty strings not empty?

Christopher Barker Chris.Barker at noaa.gov
Wed Dec 30 21:08:02 EST 2009


Charles R Harris wrote:
> That is due to type promotion for the ufunc call:
> 
> In [17]: a1 = np.array('a\x00\x00\x00')
> 
> n [21]: np.array(['a'], dtype=a1.dtype)[0]
> Out[21]: 'a'
> 
> In [22]: np.array(['a'], dtype=a1.dtype).tostring()
> Out[22]: 'a\x00\x00\x00'

it took me a bit to figure out what this meant, so in case I'm not the 
only one, I thought I'd spell it out:

In [3]: s1 = np.array('a')

In [4]: s1.dtype
Out[4]: dtype('|S1')

so s1's dytype is a length-1 string

In [11]: s2 = np.array('a\x00\x00')

In [12]: s2.dtype
Out[12]: dtype('|S3')

and s2's is a length-3 string

In [13]: s1 == s2
Out[13]: array(True, dtype=bool)

when they are compared, s1's dtype is coerced to a length 3 string by 
padding with nulls, and thus they compare equal.

otherwise, there is nothing special about zero bytes in a string:

In [14]: s3 = np.array('\x00a\x00')

In [15]: s3 == s2
Out[15]: array(False, dtype=bool)

In [16]: s3 == s1
Out[16]: array(False, dtype=bool)

The problem is that there is zero bytes are the only way to pad a 
string. I suppose the comparison could be smarter, by comparing without 
coercing, but that may not be possible without the ufunc machinery.

As for printing, I think it simply reflects that numpy strings are null 
padded, and most people probably wouldn't want to see all those nulls 
every time.

-Chris









-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list