[Numpy-discussion] Unicode characters in a numpy array

Gökhan Sever gokhansever at gmail.com
Fri Jun 17 12:48:30 EDT 2011


On Thu, Jun 16, 2011 at 8:54 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Wed, Jun 15, 2011 at 1:30 PM, Gökhan Sever <gokhansever at gmail.com> wrote:
>>
>> Hello,
>>
>> The following snippet works fine for a regular string and prints out
>> the string without a problem.
>>
>> python
>> Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
>> [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> mystr = u"öööğğğ"
>> >>> mystr
>> u'\xf6\xf6\xf6\u011f\u011f\u011f'
>> >>> type(mystr)
>> <type 'unicode'>
>> >>> print mystr
>> öööğğğ
>>
>> What is the correct way to print out the following array?
>>
>> >>> import numpy as np
>> >>> arr = np.array(u"öööğğğ")
>> >>> arr
>> array(u'\xf6\xf6\xf6\u011f\u011f\u011f',
>>      dtype='<U6')
>> >>> print arr
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>>  File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py",
>> line 1379, in array_str
>>    return array2string(a, max_line_width, precision, suppress_small,
>> ' ', "", str)
>>  File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>> line 426, in array2string
>>    lst = style(x)
>> UnicodeEncodeError: 'ascii' codec can't encode characters in position
>> 0-5: ordinal not in range(128)
>>
>
> I don't know. It might be that we need to fix the printing functions for
> unicode and maybe have some way to set the codec as well.
>
> Chuck
>

Typing
arr = np.array(u"öööğğğ")

yields UnicodeEncodeError: 'ascii' codec can't encode characters in
position 17-22: ordinal not in range(128)
in IPython 0.10. I am not sure if this is fixed in the new-coming IPython.

Typing the array in this form (with brackets) makes a difference:

>>> arr = np.array([u"öööğğğ"])
>>> print arr
[u'\xf6\xf6\xf6\u011f\u011f\u011f']
>>> print arr[0]
öööğğğ

I am wondering whether "print arr" should print out the unicode
characters in human-readable format or in this current form.

This applies to the regular Python lists as well.

>>> mylist = [u"öööğğğ"]
>>> print mylist
[u'\xf6\xf6\xf6\u011f\u011f\u011f']
>>> print mylist[0]
öööğğğ



More information about the NumPy-Discussion mailing list