Looks like this was a solution:

1. Use this guy's unescape function to convert from HTML/XML Entities to 

2. Take the unicode and convert to approximate plain ASCII matches with 
unicodedata (after import unicodedata)

ascii_content2 = unescape(line)

ascii_content = unicodedata.normalize('NFKD', 

The string "line" would give the error, but ascii_content does not.


PS: "asciiDammit" is also fun to look at

John Machin wrote:
> On Jun 11, 6:09 am, Nick Matzke <mat... at> wrote:
>> Hi all,
>> So I'm parsing an XML file returned from a database.  However, the
>> database entries have occasional non-ASCII characters, and this is
>> crashing my parsers.
> So fix your parsers. google("unicode"). Deleting stuff that you don't
> understand is an "interesting" approach to academic research :-(
> Care to divulge what "crash" means? e.g. the full traceback and error
> message, plus what version of python on what platform, what version of
> ElementTree or other XML spftware you are using ...
>> Center for Theoretical Evolutionary Genomics
> If your .sig evolves much more, it will consume all available
> bandwidth in the known universe and then some ;-)

