Re: [Numpy-discussion] proposal: smaller representation of string arrays

25 Apr 2017

      2017-04-25 12:34 GMT-04:00 Chris Barker :
...
I am totally euro-centric, but as I understand it, that is the whole point
of the desire for a compact one-byte-per character encoding. If there is a
strong need for other 1-byte encodings (shift-JIS, maybe?) then maybe we
should support that. But this all started with "mostly ascii". My take on
that is:
But Shift-JIS is not one-byte; it's two-byte (unless you allow only
half-width characters and nothing else). :-) In fact legacy CJK
encodings are all nominally two-byte (so that the width of a
character's internal representation matches that of its visual
representation).
...
- filenames
File names are one of the key reasons folks struggled with the python3 data
model (particularly on *nix) and why 'surrogateescape' was added. It's
pretty common to store filenames in with our data, and thus in numpy arrays
-- we need to preserve them exactly and display them mostly right. Again,
euro-centric, but if you are euro-centric, then latin-1 is a good choice for
this.
This I don't understand. As far as I can tell non-Western-European
filenames are not unusual. If filenames are a reason, even if you're
euro-centric (think Eastern Europe, say) I don't see how latin1 is a
good choice.

Lurker here, and I haven't touched numpy in ages. So I might be
blurting out nonsense.

-- 
Ambrose Li // http://o.gniw.ca / http://gniw.ca
If you saw this on CE-L: You do not need my permission to quote
me, only proper attribution. Always cite your sources, even if
you have to anonymize and/or cite it as "personal communication".

Re: [Numpy-discussion] proposal: smaller representation of string arrays

Ambrose LI