On Wed, Jul 16, 2014 at 6:48 AM, Todd <toddrjen@gmail.com> wrote:

On Jul 16, 2014 11:43 AM, "Chris Barker" <chris.barker@noaa.gov> wrote:
> So numpy should have dtypes to match these. We're a bit stuck, however, because 'S' mapped to the py2 string type, which no longer exists in py3. Sorry not running py3 to see what 'S' does now, but I know it's bit broken, and may be too late to change it

In py3 a 'S' dtype is converted to a python bytes object.

As a slightly philosophical aside, at some point during Scipy, Nick Coghlan said that the core Python team had stopped recommending the use of `from __future__ import unicode_literals` for Python 2 / 3 compatible code.  I have some experience now with writing 2 / 3 code for astropy and I came to the same conclusion.  The point is that `str` is the "natural" text class that is used by default for both 2 and 3.  Most scientific Py2 code is written to this model.

Following this to the Py3 end, that would imply that the most natural convention for numpy S dtype in Py3 would be that it gets to Python as a utf-8 `str`, as Chuck suggested.  I think the variable-length encoding issue is not such a problem if you follow the existing numpy convention of truncating overflowing strings on assignment.

Using utf-8 like this would (I think) make most Py2 code that uses HDF5 and FITS ASCII string data just work out of the box on Py3, which would be super.

- Tom
 

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion