[Numpy-discussion] proposal: smaller representation of string arrays

Aldcroft, Thomas aldcroft at head.cfa.harvard.edu
Tue Apr 25 22:02:38 EDT 2017


On Tue, Apr 25, 2017 at 7:11 PM, Chris Barker - NOAA Federal <
chris.barker at noaa.gov> wrote:

> > On Apr 25, 2017, at 12:38 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
> > Eh... First, on Windows and MacOS, filenames are natively Unicode.
>
> Yeah, though once they are stored I. A text file -- who the heck
> knows? That may be simply unsolvable.
> > s. And then from in Python, if you want to actually work with those
> filenames you need to either have a bytestring type or else a Unicode type
> that uses surrogateescape to represent the non-ascii characters.
>
>
> > IMO if you have filenames that are arbitrary bytestrings and you need to
> represent this properly, you should just use bytestrings -- really, they're
> perfectly friendly :-).
>
> I thought the Python file (and Path) APIs all required (Unicode)
> strings? That was the whole complaint!
>
> And no, bytestrings are not perfectly friendly in py3.
>
> This got really complicated and sidetracked, but All I'm suggesting is
> that if we have a 1byte per char string type, with a fixed encoding,
> that that encoding be Latin-1, rather than ASCII.
>
> That's it, really.
>

Fully agreed.


>
> Having a settable encoding would work fine, too.
>

Yup.

At a simple level, I just want the things that currently work just fine in
Py2 to start working in Py3. That includes being able to read / manipulate
/ compute and write back to legacy binary FITS and HDF5 files that include
ASCII-ish text data (not strictly ASCII).  Memory mapping such files should
be supportable.  Swapping type from bytes to a 1-byte char str should be
possible without altering data in memory.

BTW, I am saying "I want", but this functionality would definitely be
welcome in astropy.  I wrote a unicode sandwich workaround for the astropy
Table class (https://github.com/astropy/astropy/pull/5700) which should be
in the next release.  It would be way better to have this at a level lower
in numpy.

- Tom


>
> -CHB
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170425/e7c382f1/attachment.html>


More information about the NumPy-Discussion mailing list