Unicode and MoinMoin
Neil Hodgson
nyamatongwe+thunder at gmail.com
Mon Feb 27 00:49:54 EST 2006
Greg:
> The only issue I'm having relates to Unicode. MoinMoin and python are
> pretty unforgiving about files that contain Unicode characters that
> aren't included in the coding properly. I've spent hours reading about
> Unicode, and playing with different encoding/decoding commands, but at
> this point, I just want a hacky solution that will ignore the
> improperly coded characters or replace them with placeholders.
Call the codec with the errors argument set to "ignore" or "replace".
>>> unicode('AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G.
A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8')
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "c:\python24\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 58:
unexpected code byte
>>> unicode('AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G.
A. \x96 For references see blahblah.\n\n\n-----\n\n', 'utf8', 'replace')
u'AUTHOR: blahblah\n\nTITLE: Reading Course Readings... G. A. \ufffd For
references see blahblah.\n\n\n-----\n\n'
BTW, its probably in Windows-1252 where it would be a dash.
Depending on your context it may pay to handle the exception instead of
using "replace" and attempt interpreting as Windows-1252.
Neil
More information about the Python-list
mailing list