Unicode error in sax parser

Rickard Lindberg ricli85 at gmail.com
Wed Feb 9 03:32:14 EST 2011

On Tue, Feb 8, 2011 at 5:41 PM, Chris Rebert <clp2 at rebertia.com> wrote:
>> Here is a bash script to reproduce my error:
> Including the error message and traceback is still helpful, for future
> reference.

Thanks for pointing it out.

>>    #!/bin/sh
>>    cat > å.timeline <<EOF
> <snip>
>>    EOF
>>    python <<EOF
>>    # encoding: utf-8
>>    from xml.sax import parse
>>    from xml.sax.handler import ContentHandler
>>    parse(u"å.timeline", ContentHandler())
>>    EOF
>> If I instead do
>>    parse(u"å.timeline".encode("utf-8"), ContentHandler())
>> the script runs without errors.
>> Is this a bug or expected behavior?
> Bug; open() figures out the filesystem encoding just fine.
> Bug tracker to report the issue to: http://bugs.python.org/
> Workaround:
> parse(open(u"å.timeline", 'r'), ContentHandler())

When I tried your workaround, I still got this error:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/__init__.py",
line 31, in parse
  File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
line 109, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/xmlreader.py",
line 119, in parse
  File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
line 121, in prepareParser
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in
position 0: ordinal not in range(128)

The open(..) part works fine, but there still seems to be a problem inside the
sax parser.

Rickard Lindberg

