Unicode is driving me nuts!
skip at pobox.com
Sat Mar 13 01:34:42 CET 2004
Anthony> str = unicode(raw_str, myencoding)
Anthony> This works just fine with a small sample Chinese document.
Anthony> But when I attempted to run the script on the entire corpus, I
Anthony> get the typical "incomplete multibyte sequence error" or
Anthony> "UnicodeEncodeError: 'ascii' codec can't encode characters in
Anthony> position 0-23: ordinal not in range(128)"
Can you craft a small example which demonstrates the error but which you
think is correctly encoded?
Anthony> I am at my wit's end, so frustrated at handling
Anthony> non-ascii texts.
Unicode creates lots of problems for the uninitiated. I pulled my hair out
for a long time. It took me a couple tries to get my system to work
(more-or-less) with Unicode. It's still got the occasional problem.
More information about the Python-list