On Tue, Sep 13, 2016 at 11:05 AM, Lluís Vilanova <vilanova@ac.upc.edu> wrote:
Great, that's the type of info I wanted to get before going forward. I guess
there's code relying on the binary representation of 'S' to do mmap's or access
the array's raw contents. Is that right?

yes, there is a LOT of code, most of it third party, that relies on particular binary representations of the numpy dtypes.

There is a fundamental semantic difference between a string and a byte array,
that's the core of the problem.

well yes. but they were mingled in py2, and the 'S' dtype is essentially a py2 string. But in py3, it maps more closely with bytes than string -- though yes, not exactly either :-(

Here's an alternative that only handles the repr.
 
Whenever we repr an array using 'S', we can instead show a unicode in py3. That
keeps the binary representation, but will always show the expected result to
users, and it's only a handful of lines added to dump_data().

This would probably be more confusing than helpful -- if a 'S' object converts to a bytes object, than it's repr should show that.

-CHB

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov