Unicode characters, XML/RSS

Adam W. AWasilenko at gmail.com
Thu Jul 31 06:36:51 CEST 2008

So I wrote a little video podcast downloading script that checks a
list of RSS feeds and downloads any new videos.  Every once in a while
it find a character that is out of the 128 range in the feed and my
script blows up:

Traceback (most recent call last):
  File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 88, in <module>
  File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 75, in mainloop
  File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 69, in update
    couldhave = getshowlst(x[1],episodecnt)
  File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 30, in getshowlst
    masterlist = XMLWorkspace.parsexml(url)
  File "C:\Users\Adam\Desktop\Rev3 DL\XMLWorkspace.py", line 54, in
    parse(url, FeedHandlerInst)
  File "C:\Python25\lib\xml\sax\__init__.py", line 33, in parse
  File "C:\Python25\lib\xml\sax\expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "C:\Python25\lib\xml\sax\xmlreader.py", line 123, in parse
  File "C:\Python25\lib\xml\sax\expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
  File "C:\Users\Adam\Desktop\Rev3 DL\XMLWorkspace.py", line 51, in
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in
position 236: ordinal not in range(128)

Now its my understanding that XML can contain upper Unicode characters
as long as the encoding is specified, which it is (UTF-8).  The feed
validates every validator I've ran it through, every program I open it
with seems to be ok with it, except my python script.  Why?  Here is
the URL of the feed in question: http://revision3.com/winelibraryreserve/
My script is complaining of the fancy e in Mourvèdre

At first glance I though it was the data.append(string) that was un
accepting of the Unicode, but even if I put a return in the Character
handler loop, it still breaks.  What am I doing wrong?

More information about the Python-list mailing list