Unicode error in sax parser
Stefan Behnel
stefan_ml at behnel.de
Tue Feb 8 12:00:46 EST 2011
Rickard Lindberg, 08.02.2011 16:57:
> Hi,
>
> Here is a bash script to reproduce my error:
>
> #!/bin/sh
>
> cat> å.timeline<<EOF
> <?xml version="1.0" encoding="utf-8"?>
> <timeline>
> <version>0.13.0devb38ace0a572b+</version>
> <categories>
> </categories>
> <events>
> <event>
> <start>2011-02-01 00:00:00</start>
> <end>2011-02-03 08:46:00</end>
> <text>asdsd</text>
> </event>
> </events>
> <view>
> <displayed_period>
> <start>2011-01-24 16:38:11</start>
> <end>2011-02-23 16:38:11</end>
> </displayed_period>
> <hidden_categories>
> </hidden_categories>
> </view>
> </timeline>
> EOF
>
> python<<EOF
> # encoding: utf-8
> from xml.sax import parse
> from xml.sax.handler import ContentHandler
> parse(u"å.timeline", ContentHandler())
> EOF
>
> If I instead do
>
> parse(u"å.timeline".encode("utf-8"), ContentHandler())
>
> the script runs without errors.
>
> Is this a bug or expected behavior?
Expected behaviour. You cannot parse XML from unicode strings, especially
not when the XML data explicitly declares itself as being encoded in UTF-8.
Parse from a byte string instead, as you do in your fixed code.
Stefan
More information about the Python-list
mailing list