
2017-04-25 12:34 GMT-04:00 Chris Barker <chris.barker@noaa.gov>:
I am totally euro-centric, but as I understand it, that is the whole point of the desire for a compact one-byte-per character encoding. If there is a strong need for other 1-byte encodings (shift-JIS, maybe?) then maybe we should support that. But this all started with "mostly ascii". My take on that is:
But Shift-JIS is not one-byte; it's two-byte (unless you allow only half-width characters and nothing else). :-) In fact legacy CJK encodings are all nominally two-byte (so that the width of a character's internal representation matches that of its visual representation).
- filenames
File names are one of the key reasons folks struggled with the python3 data model (particularly on *nix) and why 'surrogateescape' was added. It's pretty common to store filenames in with our data, and thus in numpy arrays -- we need to preserve them exactly and display them mostly right. Again, euro-centric, but if you are euro-centric, then latin-1 is a good choice for this.
This I don't understand. As far as I can tell non-Western-European filenames are not unusual. If filenames are a reason, even if you're euro-centric (think Eastern Europe, say) I don't see how latin1 is a good choice. Lurker here, and I haven't touched numpy in ages. So I might be blurting out nonsense. -- Ambrose Li // http://o.gniw.ca / http://gniw.ca If you saw this on CE-L: You do not need my permission to quote me, only proper attribution. Always cite your sources, even if you have to anonymize and/or cite it as "personal communication".