
On 27 October 2015 at 13:12, Daniel Holth <dholth@gmail.com> wrote:
We must do the hard work to support Unicode file names, and spaces and accent marks in home directory names (historically a problem on Windows), in our packaging system. It is the right thing to do. It is not the publisher's fault that your system has broken Unicode.
In the examples I'm thinking of, the publisher used a format (.tar.gz) that didn't properly support Unicode, in the sense that it didn't include an encoding for the bytes it used to represent filenames. IMO, that is something we shouldn't allow, by rejecting file formats that don't support Unicode properly. Whose fault it is, is not important - it's just as easy to say that it's not the end user's fault that the publisher made an unwarranted assumption about encodings. What's important is that things work for everyone, and the interoperability standards don't leave room for people to make such assumptions. Paul PS Consider this a retraction of my suggestion that filenames in sdists should be pure ASCII. But still, sdists shouldn't contain files that can't be used on target systems - e.g., 2 files whose names differ only in case, files containing characters like :, ? or * that are invalid on Windows... Whether this needs to be noted in the standard, or whether it's just a case of directing users' bug reports back to the publisher, is an open question, though.