Unicode: console print statement and PythonWin: replacement for off-table chars HOWTO?

Robert kxroberto at googlemail.com
Tue Jan 10 15:46:08 EST 2006


Fredrik Lundh wrote:
> > Are you certain that this is a valid unicode character? Checking other
> > values (like \u0020 which is a blank space) seems to work okay. What
> > does \u034A represent?
>
> >>> import unicodedata
> >>> unicodedata.name(u"\u034A")
> 'COMBINING NOT TILDE ABOVE'
>
> (space is a valid CP850 character, combining not tilde above is not).
>

Tried around to get a tolerant print/PythonWin setup. Seems like I can
live acceptably with this:

modifying site.py/encoding to 'mbcs' on win and 'utf-8' or 'latin-1' or
locale on linux and/or (more important) doing this at startup:

# tolerant unicode output ... #
_stdout=sys.stdout
if sys.platform=='win32' and not
sys.modules.has_key('pywin.framework.startup'):
    _stdoutenc=getattr(_stdout,'encoding',sys.getdefaultencoding())
    class StdOut:
        def write(self,s):
_stdout.write(s.encode(_stdoutenc,'backslashreplace'))
    sys.stdout=StdOut()
elif sys.platform.startswith('linux'):
    import locale
    _stdoutenc=locale.getdefaultlocale()[1]
    class StdOut:
        def write(self,s):
_stdout.write(s.encode(_stdoutenc,'backslashreplace'))
    sys.stdout=StdOut()


fragile tricks... and pain on each project and python installation.
Shouldn't something like that (or 'replace') (or a prominent
switch-function for such behaviour) be the default for python - output
the maximum, not minimum ?

Robert




More information about the Python-list mailing list