New subject: Fix default encodings on Windows

Aug. 15, 2016

      Steve Dower writes:
...
I plan to use only Unicode to interact with the OS and then utf8
within Python if the caller wants bytes.
This doesn't answer Victor's questions, or mine.

This proposal requires identifying and transcoding bytes that
represent text in encodings other than UTF-8.

1.  How do you propose to identify "bytes that represent text (and
might be filenames)" if they did *not* originate in a filesystem or
console API?

2.  How do you propose to identify the non-UTF-8 encoding, if you have
forced all variables signifying bytes encodings to UTF-8?

Additional considerations:

As far as I can see, this is just a recipe for a different way to get
mojibake.  *The* way to avoid mojibake is to "let text be text"
*internally*.  Developers who insist on processing text as bytes are
going to get what they deserve *in edge cases*.  But mostly (ie, in
the mono-encoding environments of most users) it just (barely ;-) works.

And there are many use cases where you *can* process bytes that happen
to encode text as "just bytes" (eg, low-level networking code).  These
cases have performance issues if the bytes-text-bytes-text-bytes
double-round-trip implied for *stream content* (vs the OS APIs you're
concerned with, which effectively round-trip text-bytes-text) is
imposed on them.

Re: [Python-ideas] Fix default encodings on Windows

Stephen J. Turnbull

Steve Dower

Steve Dower

tags

participants (2)