Re: [Numpy-discussion] String & unicode arrays vs text loading in python 3

13 Sep 2016

      On Di, 2016-09-13 at 15:02 +0200, Lluís Vilanova wrote:
...
Hi! I'm giving a shot to issue #3184 [1], based on the observation
that the
string dtype ('S') under python 3 uses byte arrays instead of unicode
(the only
readable string type in python 3).
This brings two major problems:
* numpy code has to go through loops to open and read files as binary
data to
  load text into a bytes array, and does not play well with users
providing
  string (unicode) arguments
* the repr of these arrays shows strings as b'text' instead of
'text', which
  breaks doctests of software built on numpy
What I'm trying to do is make dtypes 'S' and 'U' equivalnt
(NPY_STRING and
NPY_UNICODE).
Now the question. Keeping 'S' and 'U' as separate dtypes (but same
internal
implementation) will provide the best backwards compatibility, but is
more
cumbersome to implement.
I am not sure how that can be possible. Those types are fundamentally
different in how they store their data. String types use one byte per
character, unicode types will use 4 bytes per character. You can maybe
default to unicode in more cases in python 3, but you cannot make them
identical internally.

What about giving `np.loadtxt` an encoding kwarg or something along
that line?

- Sebastian
...
Is it acceptable to internally just translate all appearances of 'S'
(NPY_STRING) to 'U' (NPY_UNICODE) and get rid of one of the two when
running in
python 3?
The main drawback I see is that dtype reprs would not always be as
expected:
   # python 2
   >>> np.array('foo', dtype='S')
   array('foo',
         dtype='|S3')
   # python 3
   >>> np.array('foo', dtype='S')
   array('foo',
         dtype='
[1] https://github.com/numpy/numpy/issues/3184
Cheers,
  Lluis
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] String & unicode arrays vs text loading in python 3

Sebastian Berg