[Python-Dev] Windows: Remove support of bytes filenames in theos module?

Wed Feb 10 02:56:25 EST 2016

On Feb 9, 2016, at 20:17, Stephen J. Turnbull <stephen at xemacs.org> wrote:

>> It really requires going through all the OS calls and either (a) making 
>> them consistently decode bytes to str using the declared FS encoding 
>> (currently 'mbcs', but I see no reason we can't make it 'utf_8'),
> 
> If it were that easy, it would have been done two decades ago.  I'm no
> fan of Windows[1], but it's obvious that Microsoft has devoted
> enormous amounts of brainpower to the problem of encoding
> rationalization since the early 90s.  I don't think they would have
> missed this idea.

Microsoft spent a lot of time and effort on the idea that UTF-16 (or, originally, UCS-2) everywhere was the answer. Never call the A functions (or the msvcrt functions that emulate the C and POSIX stdlib), and there's never a problem. What if you read filenames out of a text file? No problem; text files are UTF-16-BOM. Over a socket? All network protocols are also UTF-16. What if you have to read a file written in Unix? Come on, nobody's ever created a useful file without Windows. What about Windows 3.1? Uh... that's a problem. Also, what happens when Unicode goes over 64k characters? And so on. So their grand project failed.

That doesn't mean the problem can't be solved. Apple solved their equivalent problem, albeit by sacrificing backward compatibility in a way Microsoft can't get away with. I haven't seen a MacRoman or Shift-JIS filename since they broke the last holdout (the low-level AppleEvent interface) in 10.7--and most of the apps I was using back then don't run on 10.10 without an update. So Python 2 works great on Macs, whether you use bytes or unicode. But that doesn't help us on Windows, where you can't use bytes, or Linux, where you can't use Unicode (without surrogate escape or some other mechanism that Python 2 doesn't have).