Re: [lxml-dev] Re: HTMLParser status and issues
Stefan Behnel wrote:
Paul Everitt wrote:
I got this:
# 175 "/usr/include/libxml2/libxml/HTMLparser.h" typedef enum { HTML_PARSE_NOERROR = 1<<5, HTML_PARSE_NOWARNING= 1<<6, HTML_PARSE_PEDANTIC = 1<<7, HTML_PARSE_NOBLANKS = 1<<8, HTML_PARSE_NONET = 1<<11 } htmlParserOption;
That's not libxml2 2.6.22 then. I think your C compiler uses the Mac-OS system libraries instead of the libraries installed by what your xmllint uses.
Back to this issue, the missing option HTML_PARSE_RECOVER came up in libxml2 2.6.21, while Mac-OS Tiger ships with 2.6.16. However, it looks like the HTML_PARSE_* options follow the numeric values of the XML_PARSE_* enum exactly. So, as a work-around, we could use XML_PARSE_RECOVER to make it compile and simply state that libxml2 2.6.21+ is required for parsing broken HTML. That way, it would keep working with the system libraries on Mac-OS X. Paul, I applied the above change to the branch for now. I'd be glad if you could check that it now compiles with the Mac-OS system libraries. Please run the test suite. If everything works as expected, only the test case(s) for parsing broken HTML should fail. If there are no objections, I'll then start merging the HTML parser into the trunk. Stefan
Stefan Behnel wrote:
Stefan Behnel wrote:
Paul Everitt wrote:
I got this:
# 175 "/usr/include/libxml2/libxml/HTMLparser.h" typedef enum { HTML_PARSE_NOERROR = 1<<5, HTML_PARSE_NOWARNING= 1<<6, HTML_PARSE_PEDANTIC = 1<<7, HTML_PARSE_NOBLANKS = 1<<8, HTML_PARSE_NONET = 1<<11 } htmlParserOption;
That's not libxml2 2.6.22 then. I think your C compiler uses the Mac-OS system libraries instead of the libraries installed by what your xmllint uses.
Back to this issue, the missing option HTML_PARSE_RECOVER came up in libxml2 2.6.21, while Mac-OS Tiger ships with 2.6.16. However, it looks like the HTML_PARSE_* options follow the numeric values of the XML_PARSE_* enum exactly. So, as a work-around, we could use XML_PARSE_RECOVER to make it compile and simply state that libxml2 2.6.21+ is required for parsing broken HTML. That way, it would keep working with the system libraries on Mac-OS X.
Paul, I applied the above change to the branch for now. I'd be glad if you could check that it now compiles with the Mac-OS system libraries. Please run the test suite. If everything works as expected, only the test case(s) for parsing broken HTML should fail.
Go way for the weekend and I miss all the fun. :^) Yes, this works. --Paul
participants (2)
-
Paul Everitt
-
Stefan Behnel