[Numpy-discussion] numpy.array() of mixed integers and strings can truncate data
Chris Barker
Chris.Barker at noaa.gov
Thu Dec 1 15:35:00 EST 2011
On 12/1/2011 9:15 AM, Derek Homeier wrote:
>>>> np.array((2, 12,0.001+2j), dtype='|S8')
> array(['2', '12', '(0.001+2'], dtype='|S8')
>
> - notice the last value is only truncated because it had first been converted into
> a "standard" complex representation, so maybe the problem is already in the way
> Python treats the input.
no -- it's truncated because you've specified a 8 char long string, and
the string representation of complex is longer than that. I assume that
numpy is using the objects __str__ or __repr__:
In [13]: str(0.001+2j)
Out[13]: '(0.001+2j)'
In [14]: repr(0.001+2j)
Out[14]: '(0.001+2j)'
I think the only bug we've identified here is that numpy is selecting
the string size based on the longest string input, rather than checking
to see how long the string representation of the numeric input is as
well. if there is a long-enough string in there, it works fine:
In [15]: np.array([-345,4,2,'ABC', 'abcde'])
Out[15]:
array(['-345', '4', '2', 'ABC', 'abcde'],
dtype='|S5')
An open question is what it should do if you specify the length of the
string dtype, but one of the values can't be fit into that size. At this
point, it truncates, but should it raise an error?
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list