[Numpy-discussion] Empty strings not empty?
Christopher Barker
Chris.Barker at noaa.gov
Wed Dec 30 21:08:02 EST 2009
Charles R Harris wrote:
> That is due to type promotion for the ufunc call:
>
> In [17]: a1 = np.array('a\x00\x00\x00')
>
> n [21]: np.array(['a'], dtype=a1.dtype)[0]
> Out[21]: 'a'
>
> In [22]: np.array(['a'], dtype=a1.dtype).tostring()
> Out[22]: 'a\x00\x00\x00'
it took me a bit to figure out what this meant, so in case I'm not the
only one, I thought I'd spell it out:
In [3]: s1 = np.array('a')
In [4]: s1.dtype
Out[4]: dtype('|S1')
so s1's dytype is a length-1 string
In [11]: s2 = np.array('a\x00\x00')
In [12]: s2.dtype
Out[12]: dtype('|S3')
and s2's is a length-3 string
In [13]: s1 == s2
Out[13]: array(True, dtype=bool)
when they are compared, s1's dtype is coerced to a length 3 string by
padding with nulls, and thus they compare equal.
otherwise, there is nothing special about zero bytes in a string:
In [14]: s3 = np.array('\x00a\x00')
In [15]: s3 == s2
Out[15]: array(False, dtype=bool)
In [16]: s3 == s1
Out[16]: array(False, dtype=bool)
The problem is that there is zero bytes are the only way to pad a
string. I suppose the comparison could be smarter, by comparing without
coercing, but that may not be possible without the ufunc machinery.
As for printing, I think it simply reflects that numpy strings are null
padded, and most people probably wouldn't want to see all those nulls
every time.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list