Unicode encoding - ignoring errors
Chris Rebert
clp at rebertia.com
Mon Dec 29 07:10:46 EST 2008
On Mon, Dec 29, 2008 at 4:06 AM, Michal Ludvig <mludvig at logix.net.nz> wrote:
> Hi,
>
> in my script I have sys.stdout and sys.stderr redefined to output
> unicode strings in the current system encoding:
>
> encoding = locale.getpreferredencoding()
> sys.stdout = codecs.getwriter(encoding)(sys.stdout)
>
> However on some systems the locale doesn't let all the unicode chars be
> displayed and I eventually end up with UnicodeEncodeError exception.
>
> I know I could explicitly "sanitize" all output with:
>
> whatever.encode(encoding, "replace")
>
> but it's quite inconvenient. I'd much prefer to embed this "replace"
> operation into the sys.stdout writer.
>
> Is there any way to set a conversion error handler in codecs.getwriter()
> or perhaps chain it with some other filter somehow? I prefer to have
> questionmarks in the output instead of experiencing crashes with
> UnicodeEncodeErrors ;-)
You really should read the fine module docs (namely,
http://docs.python.org/library/codecs.html ).
codecs.getwriter() returns a StreamWriter subclass (basically).
The constructor of said subclass has the signature:
StreamWriter(stream[, errors])
You want the 'errors' argument.
So all you have to do is add one argument to your stdout reassignment:
sys.stdout = codecs.getwriter(encoding)(sys.stdout, 'replace')
Yay Python, for making such things easy!
Cheers,
Chris
--
Follow the path of the Iguana...
http://rebertia.com
More information about the Python-list
mailing list