[Numpy-discussion] String & unicode arrays vs text loading in python 3

Stephan Hoyer shoyer at gmail.com
Tue Sep 13 14:21:21 EDT 2016


On Tue, Sep 13, 2016 at 11:05 AM, Lluís Vilanova <vilanova at ac.upc.edu>
wrote:

> Whenever we repr an array using 'S', we can instead show a unicode in py3.
> That
> keeps the binary representation, but will always show the expected result
> to
> users, and it's only a handful of lines added to dump_data().
>
> If needed, I could easily add a bytes array to make the alternative
> explicit
> (where py3 would repr the contents as b'foo').
>
> This would only leave the less-common paths inconsistent across python
> versions,
> which should not be a problem for most examples/doctests:
>
> * A 'U' array will show u'foo' in py2 and 'foo' in py3.
> * The new binary array will show 'foo' in py2 and b'foo' in py3 (that
> could also
>   be patched on the repr code).
> * A 'O' array will not be able to do any meaningful repr conversions.
>
>
> A more complex alternative (and actually closer to what I'm proposing) is
> to
> modify numpy in py3 to restrict 'S' to using 8-bit points in a unicode
> string. It would have the binary compatibility, while being a unicode
> string in
> practice.


I'm afraid these are both also non-starters at this point. NumPy's string
dtype corresponds to bytes on Python 3, and you can use it to store
arbitrary binary values. Would it really be an improvement to change the
repr, if the scalar value resulting from indexing is still bytes?

The sanest approach is probably a new dtype for one-byte strings. We talked
about this a few years ago, but nobody has implemented it yet:
http://numpy-discussion.scipy.narkive.com/3nqDu3Zk/a-one-byte-string-dtype

(normally I would link to the archives on scipy.org, but the certificate
for HTTPS has expired so you see a big error message right now...)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160913/cf8c48db/attachment.html>


More information about the NumPy-Discussion mailing list