sax barfs on unicode filenames

John Machin sjmachin at lexicon.net
Wed Oct 4 11:35:12 EDT 2006


Diez B. Roggisch wrote:
> Edward K. Ream wrote:
>
> > Hi.  Presumably this is a easy question, but anyone who understands the
> > sax docs thinks completely differently than I do :-)
> >
> >
> >
> > Following the usual cookbook examples, my app parses an open file as
> > follows::
> >
> >
> >
> > parser = xml.sax.make_parser()
> >
> > parser.setFeature(xml.sax.handler.feature_external_ges,1)
> >
> > # Hopefully the content handler can figure out the encoding from the
> > # <?xml>
> > element.
> >
> > handler = saxContentHandler(c,inputFileName,silent)
> >
> > parser.setContentHandler(handler)
> >
> > parser.parse(theFile)
> >
> >
> >
> > Here 'theFile' is an open file.  Usually this works just fine, but when
>
> Filenames are expected to be bytestrings. So what happens is that the
> unicode string you pass as filename gets implicitly converted using the
> default encoding.
>
> You have to encode the unicode string according to your filesystem
> beforehand.

Not if your filesystem supports Unicode names, as Windows does.
Edward's point is that something is (whether by accident or "design")
trying to coerce it to str, and failing.




More information about the Python-list mailing list