[Numpy-discussion] One-byte string dtype: third time's the charm?

Mon Feb 23 12:19:46 EST 2015

On Mon, Feb 23, 2015 at 11:55 AM, Andrew Collette <andrew.collette at gmail.com
> wrote:

> Hi all,
>
> > Using latin-1 is a pragmatic compromise that provides continuity to allow
> > scientists to run their existing code in Python 3 and have things just
> work.
> > It isn't perfect and it should not be the end of the story, but it would
> be
> > good.  This single issue is the *only* thing blocking me and my team from
> > using Python 3 in operations.
>
> Since you mentioned HDF compatibility, I would just note that the two
> string formats HDF5 supports are ASCII and UTF-8, although presently
> no validation is performed by HDF5 as to the actual contents.  This
> shouldn't discourage anyone from going with Latin-1, but it would mean
> that h5py (and presumably PyTables) would have to choose from the
> following options:
>
> 1. Convert to UTF-8, and risk truncation
> 2. Store as ASCII and replace out-of-range characters with "?"
> 3. Just store the Latin-1 text in a type labelled "ASCII", and live with
> it.
> 4. Raise an exception if non-ASCII characters are present
>
> Realistically, h5py might go with (3) as the ASCII type in HDF5 is
> much abused already.
>

I was working on the assumption that (3) would be the best choice, for the
reason you gave and to minimize breakage in transitioning to Python 3.

- Tom

>
> Andrew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150223/df3f362d/attachment.html>