On Wed, 17 Mar 2021 at 08:52, Inada Naoki email@example.com wrote:
On Windows, it must be UTF-8. For example, we use `chcp 65001` in `activate.bat` to support unicode path. On Unix, raw path is bytestring. So paths can be written as-is. Python decode it with fsencoding.
Remember that .pth files contain executable code as well as paths, so fsencoding is not correct for a .pth file as a whole.
So I think this is the ideal solution. But this solution requires platform-specific code in the site.py. I don't think pth files are important enough for this complexity.
.pth files are pretty important in the packaging community. I'd strongly support making their format and behaviour more precisely defined.
Sub-optimal idea is using UTF-8. It is the best encoding for Windows. And most Unix systems use UTF-8 too.
+1. IMO, UTF-8 is the only reasonable choice here.
The problem is with the transition - we need to find a way to deal with existing `.pth` files, and with people using older version of tools (like setuptools and pipx) that write `.pth` files (so we can't assume, for example, that Python 3.12 will never see a .pth file using the old-style encoding).
It's worth noting that using the default encoding is the *correct* way of writing .pth files at the moment (as that's how site.py reads them - see https://github.com/python/cpython/blob/master/Lib/site.py#L173) so this is technically a file format change - tools writing .pth files will *have* to include version-specific code if they want to support multiple versions of Python. We need to be very clear about this - it's not just a case of "tools need to specify the encoding".