Need debugging knowhow for my creeping Unicodephobia
no.email at please.post
Wed Feb 10 20:09:46 CET 2010
Some people have mathphobia. I'm developing a wicked case of
I have read a *ton* of stuff on Unicode. It doesn't even seem all
that hard. Or so I think. Then I start writing code, and WHAM:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
(There, see? My Unicodephobia just went up a notch.)
Here's the thing: I don't even know how to *begin* debugging errors
like this. This is where I could use some help.
In the past I've gone for method of choice of the clueless:
"programming by trial-and-error", try random crap until something
"works." And if that "strategy" fails, I come begging for help to
c.l.p. And thanks for the very effective pointers for getting rid
of the errors.
But afterwards I remain as clueless as ever... It's the old "give
a man a fish" vs. "teach a man to fish" story.
I need a systematic approach to troubleshooting and debugging these
Unicode errors. I don't know what. Some tools maybe. Some useful
modules or builtin commands. A diagnostic flowchart? I don't
think that any more RTFM on Unicode is going to help (I've done it
in spades), but if there's a particularly good write-up on Unicode
debugging, please let me know.
Any suggestions would be much appreciated.
FWIW, I'm using Python 2.6. The example above happens to come from
a script that extracts data from HTML files, which are all in
English, but they are a daily occurrence when I write code to
process non-English text. The script uses Beautiful Soup. I won't
post a lot of code because, as I said, what I'm after is not so
much a way around this specific error as much as the tools and
techniques to troubleshoot it and fix it on my own. But to ground
the problem a bit I'll say that the exception above happens during
the execution of a statement of the form:
x = '%s %s' % (y, z)
Also, I found that, with the exact same values y and z as above,
all of the following statements work perfectly fine:
x = '%s' % y
x = '%s' % z
print y, z
More information about the Python-list