[Numpy-discussion] Unicode characters in a numpy array
Gökhan Sever
gokhansever at gmail.com
Fri Jun 17 12:48:30 EDT 2011
On Thu, Jun 16, 2011 at 8:54 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Wed, Jun 15, 2011 at 1:30 PM, Gökhan Sever <gokhansever at gmail.com> wrote:
>>
>> Hello,
>>
>> The following snippet works fine for a regular string and prints out
>> the string without a problem.
>>
>> python
>> Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
>> [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> mystr = u"öööğğğ"
>> >>> mystr
>> u'\xf6\xf6\xf6\u011f\u011f\u011f'
>> >>> type(mystr)
>> <type 'unicode'>
>> >>> print mystr
>> öööğğğ
>>
>> What is the correct way to print out the following array?
>>
>> >>> import numpy as np
>> >>> arr = np.array(u"öööğğğ")
>> >>> arr
>> array(u'\xf6\xf6\xf6\u011f\u011f\u011f',
>> dtype='<U6')
>> >>> print arr
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py",
>> line 1379, in array_str
>> return array2string(a, max_line_width, precision, suppress_small,
>> ' ', "", str)
>> File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>> line 426, in array2string
>> lst = style(x)
>> UnicodeEncodeError: 'ascii' codec can't encode characters in position
>> 0-5: ordinal not in range(128)
>>
>
> I don't know. It might be that we need to fix the printing functions for
> unicode and maybe have some way to set the codec as well.
>
> Chuck
>
Typing
arr = np.array(u"öööğğğ")
yields UnicodeEncodeError: 'ascii' codec can't encode characters in
position 17-22: ordinal not in range(128)
in IPython 0.10. I am not sure if this is fixed in the new-coming IPython.
Typing the array in this form (with brackets) makes a difference:
>>> arr = np.array([u"öööğğğ"])
>>> print arr
[u'\xf6\xf6\xf6\u011f\u011f\u011f']
>>> print arr[0]
öööğğğ
I am wondering whether "print arr" should print out the unicode
characters in human-readable format or in this current form.
This applies to the regular Python lists as well.
>>> mylist = [u"öööğğğ"]
>>> print mylist
[u'\xf6\xf6\xf6\u011f\u011f\u011f']
>>> print mylist[0]
öööğğğ
More information about the NumPy-Discussion
mailing list