sax barfs on unicode filenames

Edward K. Ream edreamleo at charter.net
Wed Oct 4 11:21:05 EDT 2006


> Filenames are expected to be bytestrings.

The exception happens in a method to which no fileName is passed as an 
argument.

parse_leo_file: 
'C:\\prog\\tigris-cvs\\leo\\test\\unittest\\chinese?folder\\chinese?test.leo' 
(trace of converted fileName)

Unexpected exception parsing 
C:\prog\tigris-cvs\leo\test\unittest\chinese?folder\chinese?test.leo
Traceback (most recent call last):

  File "c:\prog\tigris-cvs\leo\src\leoFileCommands.py", line 2162, in 
parse_leo_file
    parser.parse(theFile)

  File "c:\python25\lib\xml\sax\expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)

  File "c:\python25\lib\xml\sax\xmlreader.py", line 119, in parse
    self.prepareParser(source)

  File "c:\python25\lib\xml\sax\expatreader.py", line 111, in prepareParser
    self._parser.SetBase(source.getSystemId())

UnicodeEncodeError: 'ascii' codec can't encode character u'\u8116' in 
position 44: ordinal not in range(128)

To repeat, theFile is an open file.  I believe the actual filename is passed 
nowhere as an argument to sax in my code.  Just to make sure, I converted 
the filename to ascii in my code, and got (no surprise) exactly the same 
crash.  I suppose a workaround would be to pass a 'file-like-object to sax 
instead of an open file, so that theFile.getSystemId won't crash.  But this 
looks like a bug to me.

BTW:

Python 2.5.0, Tk 8.4.12, Pmw 1.2
Windows 5, 1, 2600, 2, Service Pack 2

Edward
--------------------------------------------------------------------
Edward K. Ream   email:  edreamleo at charter.net
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------





More information about the Python-list mailing list