[Numpy-discussion] cannot decode 'S'

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Jan 23 12:59:23 EST 2014


truncating null bytes in 'S' breaks decoding that needs them

>>> a = np.array([si.encode('utf-16LE') for si in ['Õsc', 'zxc']], dtype='S')
>>> a
array([b'\xd5\x00s\x00c', b'z\x00x\x00c'],
      dtype='|S6')

>>> [ai.decode('utf-16LE') for ai in a]
Traceback (most recent call last):
  File "<pyshell#118>", line 1, in <module>
    [ai.decode('utf-16LE') for ai in a]
  File "<pyshell#118>", line 1, in <listcomp>
    [ai.decode('utf-16LE') for ai in a]
  File "C:\Programs\Python33\lib\encodings\utf_16_le.py", line 16, in decode
    return codecs.utf_16_le_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x63 in position
4: truncated data

messy workaround (arrays in contrast to scalars are not truncated in `tostring`)

>>> [a[i:i+1].tostring().decode('utf-16LE') for i in range(len(a))]
['Õsc', 'zxc']

Found while playing with examples in the other thread.

Josef



More information about the NumPy-Discussion mailing list