eryk sun writes:
On Wed, Aug 17, 2016 at 9:35 AM, Stephen J. Turnbull
BTW, why "surrogate pairs"? Does Windows validate surrogates to ensure they come in pairs, but not necessarily in the right order (or perhaps sometimes they resolve to non-characters such as U+1FFFF)?
Microsoft's filesystems remain compatible with UCS2
So it's not just invalid surrogate *pairs*, it's invalid surrogates of all kinds. This means that it's theoretically possible (though I gather that it's unlikely in the extreme) for a real Windows filename to indistinguishable from one generated by Python's surrogateescape handler. What happens when Python's directory manipulation functions on Windows encounter such a filename? Do they try to write it to the disk directory? Do they succeed? Does that depend on surrogateescape? Is there a reason in practice to allow surrogateescape at all on names in Windows filesystems, at least when using the *W API? You mention non-Microsoft filesystems; are they common enough to matter? I admit that as we converge on sanity (UTF-8 for text/* content, some kind of Unicode for filesystem names) none of this is very likely to matter, but I'm a worrywart.... Steve