[Numpy-discussion] using loadtxt to load a text file in to a numpy array

Pauli Virtanen pav at iki.fi
Fri Jan 17 13:40:41 EST 2014

17.01.2014 15:09, Aldcroft, Thomas kirjoitti:
> I've been playing around with porting a stack of analysis libraries
> to Python 3 and this is a very timely thread and comment.  What I
> discovered right away is that all the string data coming from
> binary HDF5 files show up (as expected) as 'S' type,, but that
> trying to make everything actually work in Python 3 without
> converting to 'U' is a big mess of whack-a-mole.
> Yes, it's possible to change my libraries to use bytestring
> literals everywhere, but the Python 3 user experience becomes
> horrible because to interact with the data all downstream
> applications need to use bytestring literals everywhere.  E.g.
> doing a simple filter like `string_array == 'foo'` doesn't work,
> and this will break all existing code when trying to run in Python
> 3.  And every time you try to print something it has this horrible
> "b" in front.  Ugly, and it just won't work well in the end.

Ok, I see your point.

Having additional Unicode data types with smaller widths could be
useful. On Python 2, they would then be Unicode strings, right? Thanks
to Py2 automatic Unicode encoding/decoding, they might also be usable
in interactive etc. use on Py2.

Adding new data types in Numpy codebase takes some work, but it's
possible to do.

There's also an issue (as noted in the Github ticket) that
array([u'foo'], dtype=bytes) encodes silently via the ASCII codec.
This is probably not how it should be.

Pauli Virtanen

More information about the NumPy-Discussion mailing list