[Expat-discuss] expat parser parsing utf incorrectly

Aman Teja aman at amsoft.net
Tue Dec 6 13:47:35 CET 2005


On feeding the expat parser with the following XML, it is returning me
incorrect output:

"""<?xml version='1.0' encoding='UTF-8'?>

    <methodCall>

    <methodName>jaman.video.addReview</methodName>

    <params>

    <param>

    <value><struct>

    <member><name>sessionID</name>

    <value><string> abc</string></value>

    </member>

    <member><name>videoID</name>

    <value><string>1003</string></value>

    </member>

    <member><name>summary</name>

    <value><string>&#195;&#160;</string></value>

    </member>

    </struct></value>

    </param>

    </params>

    </methodCall> """

 

I expect the parser to understand that &#195;&#160; is the UTF
representation for â and treat it as 1 letter. Instead it is interpreting it
as 2 strings : u'\xc3'

And u'\xa0' which is not correct.

 

Is that a bug ? Any ideas?

 

=Aman



More information about the Expat-discuss mailing list