Unicode error in sax parser
Stefan Behnel
stefan_ml at behnel.de
Wed Feb 9 03:58:19 EST 2011
Rickard Lindberg, 09.02.2011 09:32:
> On Tue, Feb 8, 2011 at 5:41 PM, Chris Rebert<clp2 at rebertia.com> wrote:
>>> Here is a bash script to reproduce my error:
>>
>> Including the error message and traceback is still helpful, for future
>> reference.
>
> Thanks for pointing it out.
>
>>> #!/bin/sh
>>>
>>> cat> å.timeline<<EOF
>> <snip>
>>> EOF
>>>
>>> python<<EOF
>>> # encoding: utf-8
>>> from xml.sax import parse
>>> from xml.sax.handler import ContentHandler
>>> parse(u"å.timeline", ContentHandler())
>>> EOF
>>>
>>> If I instead do
>>>
>>> parse(u"å.timeline".encode("utf-8"), ContentHandler())
>>>
>>> the script runs without errors.
>>>
>>> Is this a bug or expected behavior?
>>
>> Bug; open() figures out the filesystem encoding just fine.
>> Bug tracker to report the issue to: http://bugs.python.org/
>>
>> Workaround:
>> parse(open(u"å.timeline", 'r'), ContentHandler())
>
> When I tried your workaround, I still got this error:
>
> Traceback (most recent call last):
> File "<stdin>", line 4, in<module>
> File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/__init__.py",
> line 31, in parse
> parser.parse(filename_or_stream)
> File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
> line 109, in parse
> xmlreader.IncrementalParser.parse(self, source)
> File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/xmlreader.py",
> line 119, in parse
> self.prepareParser(source)
> File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py",
> line 121, in prepareParser
> self._parser.SetBase(source.getSystemId())
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in
> position 0: ordinal not in range(128)
>
> The open(..) part works fine, but there still seems to be a problem inside the
> sax parser.
Did you read my reply?
Stefan
More information about the Python-list
mailing list