Unicode and MoinMoin
Fredrik Lundh
fredrik at pythonware.com
Mon Feb 27 12:21:14 EST 2006
Neil Hodgson wrote:
> > The only issue I'm having relates to Unicode. MoinMoin and python are
> > pretty unforgiving about files that contain Unicode characters that
> > aren't included in the coding properly. I've spent hours reading about
> > Unicode, and playing with different encoding/decoding commands, but at
> > this point, I just want a hacky solution that will ignore the
> > improperly coded characters or replace them with placeholders.
>
> Call the codec with the errors argument set to "ignore" or "replace".
>
> >>> unicode('AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G.
> A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8')
> Traceback (most recent call last):
> File "<interactive input>", line 1, in ?
> File "c:\python24\lib\encodings\utf_8.py", line 16, in decode
> return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 58:
> unexpected code byte
> >>> unicode('AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G.
> A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8', 'replace')
> u'AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G. A. \ufffd For
> references see blahblah.\n\n\n-----\n\n'
>
> BTW, its probably in Windows-1252 where it would be a dash.
> Depending on your context it may pay to handle the exception instead of
> using "replace" and attempt interpreting as Windows-1252.
here's one way to explicitly deal with 1252 gremlins:
http://effbot.org/zone/unicode-gremlins.htm
</F>
More information about the Python-list
mailing list