On Wed, Mar 17, 2021 at 1:11 AM Michał Górny email@example.com wrote:
On Wed, 2021-03-17 at 13:55 +0900, Inada Naoki wrote:
OK. setuptools doesn't specify encoding at all. So locale-specific encoding is used. We can not fix it in short term.
How about writing paths as bytestrings in the long term? I think this should eliminate the necessity of knowing the correct encoding for the filesystem.
On Linux and many Unixes, there is no "correct" filesystem encoding. ASCII and UTF-8 are probably the most common encodings for individual files, maybe even large collections of files, but nevertheless, paths are bytestrings. Treating paths as UTF-8 works fine for most files, but once in a while there'll be a filename that fails to convert, and that's not the fault of the filename.
For example, what happens if you need a file to be named touch "Ma$(echo | tr '\012' '\361')ana" ?
For a presentation application (for EG), assuming UTF-8 is probably fine, maybe even a good thing. But for a filesystem backup tool, it's important to not assume an encoding so you can back up and restore all filenames irrespective of what the files' creators intended encodingwise.