Unicode and coersion

Wilfredo Sánchez wsanchez at apple.com
Fri Dec 12 13:44:58 EST 2003


   So I'm running into the very lovely exception:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: 
ordinal not in range(128)

   And I've got some workarounds, but I'd like to better understand 
what's going on.  First the code that throws:

     outfile.write(upc                       + "\t" +
                   title                     + "\t" +
                   playlist.display_artist() + "\t" +
                   playlist.release_date()   + "\t")

   The offending input is that playlist.display_artist() was returning a 
unicode string, which is obtained by parsing some XML which is utf-8 
encoded.  The artist name is Chanté Moore, and I verified that the XML 
is encoded correctly.

   So I changed the playlist class so that all strings fetched from XML 
get encode('utf-8') called on them, but this still craps out, so that 
wasn't the (only?) problem.  What's surprising is that this works:

     outfile.write(upc                       + "\t")
     outfile.write(title                     + "\t")
     outfile.write(playlist.display_artist() + "\t")
     outfile.write(playlist.release_date()   + "\t")

   This is surprising because I would have expected to have to separate 
the "\t"s as well.  Can someone explain what's going on?  Why does it 
try to coerce the string to ascii in the first case but not the second? 
  And shouldn't utf-8 work in any case?

     -wsv
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2406 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20031212/ec237276/attachment.bin>


More information about the Python-list mailing list