Mailman 3 [lxml-dev] Target parser parsing error - lxml - The Python XML Toolkit

3 Jun 2009


      Hi, all
   There are more informations about my parsing error when I use target parser to parse http://www.jiayuan.com/ . The fatal error reported out is: Input is not proper UTF-8, indicate encoding ! To find the real place where this problem occured, I have tried to convert the HTML string encoding with iconv directly. This time it also report error, and the error character index in string is just the same with my lxml test. Now things are clear that this parsing error is caused by encoding conversion of iconv from utf-8 to utf-8 when there are illegal characters in the source. When I do not define the data function in my target parser, It will paser without error report. Is it means that when I escape the data function , the UTF-8 to UTF-8 conversion is  also escaped ? Or some correct conversion has been done before the call to the data function ?
                                                  yours

[lxml-dev] Target parser parsing error

qhlonline

Stefan Behnel

qhlonline

Stefan Behnel

tags

participants (2)