On 11 August 2016 at 01:41, Chris Angelico <rosuav@gmail.com> wrote:
I've almost never seen files stored in UTF-32 (even UTF-16 isn't all that common compared to UTF-8), so I wouldn't stress too much about that. Recognizing FE FF or FF FE and decoding as UTF-16 might be worth doing, but it could easily be retrofitted (that byte sequence won't decode as UTF-8).
I see UTF-16 relatively often as a result of redirecting stdout in Powershell and forgetting that it defaults (stupidly, IMO) to UTF-16.
The main problem here is that if the console is not forced to UTF-8 then it won't render any of the characters correctly.
Ehh, that's annoying. Is there a way to guarantee, at the process level, that the console will be returned to "normal state" when Python exits? If not, there's the risk that people run a Python program and then the *next* program gets into trouble.
There's also the risk that Python programs using subprocess.Popen start the subprocess with the console in a non-standard state. Should we be temporarily restoring the console codepage in that case? How does the following work? <start> set codepage to UTF-8 ... set codepage back spawn subprocess X, but don't wait for it set codepage to UTF-8 ... ... At this point what codepage does Python see? What codepage does process X see? (Note that they are both sharing the same console). ... <end> restore codepage Paul