[Numpy-discussion] numpy.array() of mixed integers and strings can truncate data

Thu Dec 1 15:35:00 EST 2011

On 12/1/2011 9:15 AM, Derek Homeier wrote:
>>>> np.array((2, 12,0.001+2j), dtype='|S8')
>   array(['2', '12', '(0.001+2'], dtype='|S8')
>
> - notice the last value is only truncated because it had first been converted into
> a "standard" complex representation, so maybe the problem is already in the way
> Python treats the input.

no -- it's truncated because you've specified a 8 char long string, and 
the string representation of complex is longer than that. I assume that 
numpy is using the objects __str__ or __repr__:

In [13]: str(0.001+2j)
Out[13]: '(0.001+2j)'

In [14]: repr(0.001+2j)
Out[14]: '(0.001+2j)'

I think the only bug we've identified here is that numpy is selecting 
the string size based on the longest string input, rather than checking 
to see how long the string representation of the numeric input is as 
well. if there is a long-enough string in there, it works fine:

In [15]: np.array([-345,4,2,'ABC', 'abcde'])
Out[15]:
array(['-345', '4', '2', 'ABC', 'abcde'],
       dtype='|S5')

An open question is what it should do if you specify the length of the 
string dtype, but one of the values can't be fit into that size. At this 
point, it truncates, but should it raise an error?

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov