Unicode error in sax parser

Stefan Behnel stefan_ml at behnel.de
Wed Feb 9 09:58:19 CET 2011


Rickard Lindberg, 09.02.2011 09:32:
> On Tue, Feb 8, 2011 at 5:41 PM, Chris Rebert<clp2 at rebertia.com>  wrote:
>>> Here is a bash script to reproduce my error:
>>
>> Including the error message and traceback is still helpful, for future
>> reference.
>
> Thanks for pointing it out.
>
>>>     #!/bin/sh
>>>
>>>     cat>  å.timeline<<EOF
>> <snip>
>>>     EOF
>>>
>>>     python<<EOF
>>>     # encoding: utf-8
>>>     from xml.sax import parse
>>>     from xml.sax.handler import ContentHandler
>>>     parse(u"å.timeline", ContentHandler())
>>>     EOF
>>>
>>> If I instead do
>>>
>>>     parse(u"å.timeline".encode("utf-8"), ContentHandler())
>>>
>>> the script runs without errors.
>>>
>>> Is this a bug or expected behavior?
>>
>> Bug; open() figures out the filesystem encoding just fine.
>> Bug tracker to report the issue to: http://bugs.python.org/
>>
>> Workaround:
>> parse(open(u"å.timeline", 'r'), ContentHandler())
>
> When I tried your workaround, I still got this error:
>
> Traceback (most recent call last):
>    File "<stdin>", line 4, in<module>
>    File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/__init__.py",
> line 31, in parse
>      parser.parse(filename_or_stream)
>    File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
> line 109, in parse
>      xmlreader.IncrementalParser.parse(self, source)
>    File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/xmlreader.py",
> line 119, in parse
>      self.prepareParser(source)
>    File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
> line 121, in prepareParser
>      self._parser.SetBase(source.getSystemId())
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in
> position 0: ordinal not in range(128)
>
> The open(..) part works fine, but there still seems to be a problem inside the
> sax parser.

Did you read my reply?

Stefan




More information about the Python-list mailing list