[Numpy-discussion] Python3, genfromtxt and unicode

Antony Lee antony.lee at berkeley.edu
Tue May 1 15:24:37 EDT 2012


Sure, I will.  Right now my solution is to use genfromtxt once with bytes
and auto-dtype detection, then modify the resulting dtype, replacing bytes
with unicodes, and use that new dtypes for a second round of genfromtxt.  A
bit awkward but that gets the job done.
Antony Lee

2012/5/1 Charles R Harris <charlesr.harris at gmail.com>

>
>
> On Fri, Apr 27, 2012 at 8:17 PM, Antony Lee <antony.lee at berkeley.edu>wrote:
>
>> With bytes fields, genfromtxt(dtype=None) sets the sizes of the fields to
>> the largest number of chars (npyio.py line 1596), but it doesn't do the
>> same for unicode fields, which is a pity.  See example below.
>> I tried to change npyio.py around line 1600 to add that but it didn't
>> work; from my limited understanding the problem comes earlier, in the way
>> StringBuilder is defined(?).
>> Antony Lee
>>
>> import io, numpy as np
>> s = io.BytesIO()
>> s.write(b"abc 1\ndef 2")
>> s.seek(0)
>> t = np.genfromtxt(s, dtype=None) # (or converters={0: bytes})
>> print(t, t.dtype) # -> [(b'a', 1) (b'b', 2)] [('f0', '|S1'), ('f1',
>> '<i8')]
>> s.seek(0)
>> t = np.genfromtxt(s, dtype=None, converters={0: lambda s:
>> s.decode("utf-8")})
>> print(t, t.dtype) # -> [('', 1) ('', 2)] [('f0', '<U0'), ('f1', '<i8')]
>>
>>
> Could you open a ticket for this?
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120501/9d24b46d/attachment.html>


More information about the NumPy-Discussion mailing list