[Numpy-discussion] Python3, genfromtxt and unicode

Tue May 1 13:37:48 EDT 2012

On Fri, Apr 27, 2012 at 8:17 PM, Antony Lee <antony.lee at berkeley.edu> wrote:

> With bytes fields, genfromtxt(dtype=None) sets the sizes of the fields to
> the largest number of chars (npyio.py line 1596), but it doesn't do the
> same for unicode fields, which is a pity.  See example below.
> I tried to change npyio.py around line 1600 to add that but it didn't
> work; from my limited understanding the problem comes earlier, in the way
> StringBuilder is defined(?).
> Antony Lee
>
> import io, numpy as np
> s = io.BytesIO()
> s.write(b"abc 1\ndef 2")
> s.seek(0)
> t = np.genfromtxt(s, dtype=None) # (or converters={0: bytes})
> print(t, t.dtype) # -> [(b'a', 1) (b'b', 2)] [('f0', '|S1'), ('f1', '<i8')]
> s.seek(0)
> t = np.genfromtxt(s, dtype=None, converters={0: lambda s:
> s.decode("utf-8")})
> print(t, t.dtype) # -> [('', 1) ('', 2)] [('f0', '<U0'), ('f1', '<i8')]
>
>
Could you open a ticket for this?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120501/fcf66619/attachment.html>