Re: [Numpy-discussion] proposal: smaller representation of string arrays

26 Apr 2017

      On Tue, Apr 25, 2017 at 7:11 PM, Chris Barker - NOAA Federal <
chris.barker@noaa.gov> wrote:
...
...
On Apr 25, 2017, at 12:38 PM, Nathaniel Smith <njs@pobox.com> wrote:
...
Eh... First, on Windows and MacOS, filenames are natively Unicode.
...
s. And then from in Python, if you want to actually work with those
filenames you need to either have a bytestring type or else a Unicode type
Yeah, though once they are stored I. A text file -- who the heck
knows? That may be simply unsolvable.
that uses surrogateescape to represent the non-ascii characters.
...
IMO if you have filenames that are arbitrary bytestrings and you need to
represent this properly, you should just use bytestrings -- really, they're
perfectly friendly :-).
I thought the Python file (and Path) APIs all required (Unicode)
strings? That was the whole complaint!
And no, bytestrings are not perfectly friendly in py3.
This got really complicated and sidetracked, but All I'm suggesting is
that if we have a 1byte per char string type, with a fixed encoding,
that that encoding be Latin-1, rather than ASCII.
That's it, really.
Fully agreed.
...
Having a settable encoding would work fine, too.
Yup.

At a simple level, I just want the things that currently work just fine in
Py2 to start working in Py3. That includes being able to read / manipulate
/ compute and write back to legacy binary FITS and HDF5 files that include
ASCII-ish text data (not strictly ASCII).  Memory mapping such files should
be supportable.  Swapping type from bytes to a 1-byte char str should be
possible without altering data in memory.

BTW, I am saying "I want", but this functionality would definitely be
welcome in astropy.  I wrote a unicode sandwich workaround for the astropy
Table class (https://github.com/astropy/astropy/pull/5700) which should be
in the next release.  It would be way better to have this at a level lower
in numpy.

- Tom
...
-CHB
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] proposal: smaller representation of string arrays

Aldcroft, Thomas