On Tue, Aug 16, 2016 at 3:56 PM, Steve Dower <steve.dower@python.org> wrote:
2. Windows file system encoding is *always* UTF-16. There's no "assuming mbcs" or "assuming ACP" or "assuming UTF-8" or "asking the OS what encoding it is". We know exactly what the encoding is on every supported version of Windows. UTF-16.
Internal filesystem details don't directly affect this issue, except for how each filesystem handles invalid surrogates in names passed to functions in the wide-character API. Some filesystems that are available on Windows do reject a filename that has an invalid surrogate, so I think any program that attempts to create such malformed names is already broken. For example, with NTFS I can create a file named "\ud800b\ud800a\ud800d", but trying this in a VirtualBox shared folder fails because the VBoxSF filesystem can't transcode the name to its internal UTF-8 encoding. Thus I don't think supporting invalid surrogates should be a deciding factor in favor of UTF-16, which I think is an unpractical choice. Bytes coming from files, databases, and the network are likely to be either UTF-8 or some legacy encoding, so the practical choice is between ANSI/OEM and UTF-8. The reliable choice is UTF-8. Using UTF-8 for bytes paths can be adopted at first in 3.6 as an option that gets enabled via an environment variable. If it's not enabled or explicitly disabled, show a visible warning (i.e. not requiring -Wall) that legacy bytes paths are deprecated. In 3.7 UTF-8 can become the default, but the same environment variable should allow opting out to use the legacy encoding. The infrastructure put in place to support this should be able to work either way. Victor, I haven't checked Steve's patch yet in issue 27781, but making this change should largely simplify the Windows support code in many cases, as the bytes path conversion can be centralized, and relatively few functions return strings that need to be encoded back as bytes. posixmodule.c will no longer need separate code paths that call *A functions, e.g.: CreateFileA, CreateDirectoryA, CreateHardLinkA, CreateSymbolicLinkA, DeleteFileA, RemoveDirectoryA, FindFirstFileA, MoveFileExA, GetFileAttributesA, GetFileAttributesExA, SetFileAttributesA, GetCurrentDirectoryA, SetCurrentDirectoryA, SetEnvironmentVariableA, ShellExecuteA