data:image/s3,"s3://crabby-images/c5c51/c5c5148aeed9b9d619f7c3e0fded64c010a11a8f" alt=""
Forgot to ask the question about status. :^) First, there are two branches: http://codespeak.net/svn/lxml/branch/htmlparse/ http://codespeak.net/svn/lxml/branch/htmlparser/ I'm presuming the latter is the one I want. Perhaps the former should get renamed to something less of a decoy? Next, once I get the parser working, I'd also like to use extensions as described here: http://codespeak.net/svn/lxml/trunk/doc/extensions.txt However, the htmlparser branch is older than the extensions work (I believe). Stefan, any chance the htmlparser branch could get the changes from the trunk? I'm particularly eager to get this combination working. The pipeline templating stuff I'm working on needs to handle non-well-formed HTML. It also needs a workaround for the fact that DOCTYPE (and encoding) information isn't available in the parse tree and thus isn't available in an XSLT template. As a workaround, I'd like to retrieve the information out-of-band and make it available as an extension function. --Paul Paul Everitt wrote:
Howdy. I was giving the htmlparser branch a try. In trying to compile it, I got:
python setup.py build_ext -i running build_ext building 'lxml.etree' extension gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I/Library/Frameworks/Python.framework/Versions/2.4/include/python2.4 -c src/lxml/etree.c -o build/temp.darwin-8.6.0-Power_Macintosh-2.4/src/lxml/etree.o -w -I/usr/include/libxml2 src/lxml/etree.c: In function '__pyx_f_5etree_10HTMLParser___init__': src/lxml/etree.c:17245: error: 'HTML_PARSE_RECOVER' undeclared (first use in this function) src/lxml/etree.c:17245: error: (Each undeclared identifier is reported only once src/lxml/etree.c:17245: error: for each function it appears in.) src/lxml/etree.c:17256: error: 'HTML_PARSE_COMPACT' undeclared (first use in this function) src/lxml/etree.c: In function 'initetree': src/lxml/etree.c:31135: error: 'HTML_PARSE_RECOVER' undeclared (first use in this function) src/lxml/etree.c:31135: error: 'HTML_PARSE_COMPACT' undeclared (first use in this function) error: command 'gcc' failed with exit status 1
--Paul