On Apr 25, 2017 9:35 AM, "Chris Barker" <chris.barker@noaa.gov> wrote:

- filenames

File names are one of the key reasons folks struggled with the python3 data model (particularly on *nix) and why 'surrogateescape' was added. It's pretty common to store filenames in with our data, and thus in numpy arrays -- we need to preserve them exactly and display them mostly right. Again, euro-centric, but if you are euro-centric, then latin-1 is a good choice for this.

Eh... First, on Windows and MacOS, filenames are natively Unicode. So you don't care about preserving the bytes, only the characters. It's only Linux and the other traditional unixes where filenames are natively bytestrings. And then from in Python, if you want to actually work with those filenames you need to either have a bytestring type or else a Unicode type that uses surrogateescape to represent the non-ascii characters. I'm not seeing how latin1 really helps anything here -- best case you still have to do something like the wsgi "encoding dance" before you could use the filenames. IMO if you have filenames that are arbitrary bytestrings and you need to represent this properly, you should just use bytestrings -- really, they're perfectly friendly :-).

-n