Unicode and MoinMoin

Mon Feb 27 12:21:14 EST 2006

Neil Hodgson wrote:

> > The only issue I'm having relates to Unicode. MoinMoin and python are
> > pretty unforgiving about files that contain Unicode characters that
> > aren't included in the coding properly. I've spent hours reading about
> > Unicode, and playing with different encoding/decoding commands, but at
> > this point, I just want a hacky solution that will ignore the
> > improperly coded characters or replace them with placeholders.
>
>     Call the codec with the errors argument set to "ignore" or "replace".
>
>  >>> unicode('AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G.
> A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8')
> Traceback (most recent call last):
>    File "<interactive input>", line 1, in ?
>    File "c:\python24\lib\encodings\utf_8.py", line 16, in decode
>      return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 58:
> unexpected code byte
>  >>> unicode('AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G.
> A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8', 'replace')
> u'AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G. A. \ufffd For
> references see blahblah.\n\n\n-----\n\n'
>
>     BTW, its probably in Windows-1252 where it would be a dash.
> Depending on your context it may pay to handle the exception instead of
> using "replace" and attempt interpreting as Windows-1252.

here's one way to explicitly deal with 1252 gremlins:

    http://effbot.org/zone/unicode-gremlins.htm

</F>