[Numpy-discussion] proposal: smaller representation of string arrays
njs at pobox.com
Tue Apr 25 15:37:16 EDT 2017
On Apr 25, 2017 9:35 AM, "Chris Barker" <chris.barker at noaa.gov> wrote:
File names are one of the key reasons folks struggled with the python3 data
model (particularly on *nix) and why 'surrogateescape' was added. It's
pretty common to store filenames in with our data, and thus in numpy arrays
-- we need to preserve them exactly and display them mostly right. Again,
euro-centric, but if you are euro-centric, then latin-1 is a good choice
Eh... First, on Windows and MacOS, filenames are natively Unicode. So you
don't care about preserving the bytes, only the characters. It's only Linux
and the other traditional unixes where filenames are natively bytestrings.
And then from in Python, if you want to actually work with those filenames
you need to either have a bytestring type or else a Unicode type that uses
surrogateescape to represent the non-ascii characters. I'm not seeing how
latin1 really helps anything here -- best case you still have to do
something like the wsgi "encoding dance" before you could use the
filenames. IMO if you have filenames that are arbitrary bytestrings and you
need to represent this properly, you should just use bytestrings -- really,
they're perfectly friendly :-).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion