Why does the "".join(r) do this?

Peter Otten __peter__ at web.de
Thu May 20 12:04:29 EDT 2004


Jim Hefferon wrote:

> I'm getting an error join-ing strings and wonder if someone can
> explain why the function is behaving this way?  If I .join in a string
> that contains a high character then I get an ascii codec decoding
> error.  (The code below illustrates.)  Why doesn't it just
> concatenate?

Let's reduce the problem to its simplest case:

>>> unichr(174) + chr(174)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 0:
ordinal not in range(128)

So why doesn't it just concatenate? Because there is no way of knowing how
to properly decode chr(174) or any other non-ascii character to unicode:

>>> chr(174).decode("latin1")
u'\xae'
>>> chr(174).decode("latin2")
u'\u017d'
>>>

Use either unicode or str, but don't mix them. That should keep you out of
trouble.

Peter




More information about the Python-list mailing list