[issue11159] Sax parser crashes if given unicode file name

Rickard Lindberg report at bugs.python.org
Wed Feb 9 15:20:03 CET 2011


New submission from Rickard Lindberg <ricli85 at gmail.com>:

The error is the following:

    Traceback (most recent call last):
      File "<stdin>", line 4, in <module>
      File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/__init__.py", line 31, in parse
        parser.parse(filename_or_stream)
      File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse
        xmlreader.IncrementalParser.parse(self, source)
      File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/xmlreader.py", line 119, in parse
        self.prepareParser(source)
      File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 121, in prepareParser
        self._parser.SetBase(source.getSystemId())
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 0: ordinal not in range(128)

The following bash script can be used to reproduce the error:

    #!/bin/sh

    cat > å.timeline <<EOF
    <?xml version="1.0" encoding="utf-8"?>
    <timeline>
      <version>0.13.0devb38ace0a572b+</version>
      <categories>
      </categories>
      <events>
        <event>
          <start>2011-02-01 00:00:00</start>
          <end>2011-02-03 08:46:00</end>
          <text>asdsd</text>
        </event>
      </events>
      <view>
        <displayed_period>
          <start>2011-01-24 16:38:11</start>
          <end>2011-02-23 16:38:11</end>
        </displayed_period>
        <hidden_categories>
        </hidden_categories>
      </view>
    </timeline>
    EOF

    python <<EOF
    # encoding: utf-8
    from xml.sax import parse
    from xml.sax.handler import ContentHandler
    parse(open(u"å.timeline", 'r'), ContentHandler())
    EOF

If I instead do this, it works fine:

    parse(u"å.timeline".encode("utf-8"), ContentHandler())

Also:

    >>> sys.getfilesystemencoding()
    'UTF-8'

I heard from another user that this was not a problem with Python 3.1.2.

----------
components: XML
messages: 128212
nosy: ricli85
priority: normal
severity: normal
status: open
title: Sax parser crashes if given unicode file name
type: crash
versions: Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11159>
_______________________________________


More information about the Python-bugs-list mailing list