[Python-Dev] Re: pth file encoding

17 Mar 2021

      On 3/17/2021 7:34 PM, Ivan Pozdeev via Python-Dev wrote:
...
On 17.03.2021 20:30, Steve Dower wrote:
...
On 3/17/2021 8:00 AM, Michał Górny wrote:
...
How about writing paths as bytestrings in the long term?  I think this
should eliminate the necessity of knowing the correct encoding for
the filesystem.
That's what we're trying to do, the problem is that they start as 
strings, and so we need to convert them to a bytestring.
That conversion is the encoding ;)
And yeah, for reading, I'd use a UTF-8 reader that falls back to 
locale on failure (and restarts reading the file). But for writing, we 
need the tools that create these files (including Notepad!) to use the 
encoding we want.
I don't see a problem with using a file encoding specification like in 
Python source files.
Since site.py is under our control, we can introduce it easily.
We can opt to allow only UTF-8 here -- then we wait out a transitional 
period and disallow anything else than UTF-8 (then the specification can 
be removed, too).
The only thing we can introduce *easily* is an error when the 
(exclusively third-party) tools that create them aren't up to date. 
Getting everyone to specify the encoding we want is a much bigger 
problem with a much slower solution.

This particular file is probably the worst case scenario, but preferring 
UTF-8 and handling existing files with a fallback is the best we can do 
(especially since an assumption of UTF-8 can be invalidated on a 
particular file, whereas most locale encodings cannot). Once we openly 
document that it should be UTF-8, tools will have a chance to catch up, 
and eventually the fallback will become harmless.

Cheers,
Steve