Need debugging knowhow for my creeping Unicodephobia

Chris Rebert clp2 at
Wed Feb 10 22:43:14 CET 2010

On Wed, Feb 10, 2010 at 1:03 PM, kj < at> wrote:
> In <402ac982-0750-4977-adb2-602b19149d81 at>
Jonathan Gardner <jgardner at> writes:
<huge snip>
>>It sounds like someone, probably beautiful soup, is trying to turn
>>your strings into unicode. A full stacktrace would be useful to see
>>who did what where.
> Unfortunately, there's not much in the stacktrace:
> Traceback (most recent call last):
>  File "./", line 427, in <module >
>    x = "%s %s" % (table['id'],
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 41:
ordinal not in range(128)

Think I've found the problem. According to the BeautifulSoup docs,
renderContents() returns
a (by default, UTF-8-encoded) str [i.e. byte sequence] as opposed to
unicode[i.e. abstract code point sequence]
Thus, as was said previously, you're combining the unicode from
table['id']and the
str from renderContents(),
so Python tries to automatically+implicitly convert the str to unicode by
decoding it as ASCII.
However, it's not ASCII but UTF-8, hence you get the error about it having
non-ASCII bytes.

Solution: Convert the output of renderContents() back to unicode.

x = u"%s %s" % (table['id'],'utf8'))

Now only unicode objects are being combined.

Your problem is particularly ironic considering how well BeautifulSoup
handles Unicode overall;
I was unable to locate a renderContents() equivalent that returned unicode.

