[XML-SIG] Encoding detection in the html parser from libxml2

Wed Feb 8 11:46:01 CET 2006

Hi,

I am parsing html documents using the html parser from libxml2, and if
the encoding is included in the document it works perfectly but if it
is not, I think it does not work well (probably because I am doing
something wrong).

As it is said in
http://xmlsoft.org/encoding.html<http://www.google.com/url?sa=D&q=http://xmlsoft.org/encoding.html>the
parser should
detect the encoding. So I tested it putting an utf-8 word in a file and
it does not detect it (it generates a wrong string). Example:
reducciÃ³n --> reducciÃÂ³n.

I just use the parser as a SAX parser because I do not need a tree, so
to parse the file I use the htmlParseChunk() function and I create the
context with htmlCreatePushParser().

Is it posible that the encoding detection does not work with
htmlParseChunk? If it is so, what method should I use?
Thanks, Cesar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20060208/2c6c0901/attachment.htm