[Python-ideas] Processing surrogates in

Sat May 16 12:19:26 CEST 2015

On 16 May 2015 at 04:56, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> That's not the use case envisioned for these functions, though.  You
> want to change the textual content of the stream (by restricting the
> repertoire), not change the representation of non-textual content.

Thanks. I see the difference now. (Plus Nick's point about needing to
know the encoding in my use case).

>  > The encode/decode pair seemed ugly, although it was the only way I
>  > could find.
>
> I find the fact that there's an output stream with an inappropriate
> error handler far uglier!

The stream in this case was sys.stdout, which you can't blame me for, though :-)

The use case in question was specifically wanting to avoid encoding
errors when printing arbitrary text. (On Windows, where
sys.stdout.encoding is not UTF-8). This is a pretty common issue that
I see raised a lot, and it is frustrating to have to deal with it in
application code. I don't know enough about the issues to make a good
case that errors='strict' is the wrong error handling policy for
sys.stdout, though. And you can't change the policy on an existing
stream, so the application is stuck with strict unless it wants to
re-wrap sys.stdout.buffer (which I'm always a little reluctant to do,
as it seems like it may cause other issues, although I don't know why
I think that :-)).

> Note that the encode/decode pair is quite efficient, although the
> "rehandle" function could be about twice as fast.  Still, if you're
> output-bound by the speed of a disk or the like, encode/decode will
> have no trouble keeping up.

Yeah, it's not a performance issue, just a mild feeling of "this looks clumsy".

Paul