python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
Stefan Behnel
stefan_ml at behnel.de
Sat Aug 18 13:56:42 EDT 2012
Dmitry Arsentiev, 15.08.2012 14:49:
> Has anybody already meet the problem like this? -
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
> When I run scrapy, I get
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
> line 14, in <module>
> libxml2.HTML_PARSE_NOERROR + \
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
>
> When I run
> python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'
>
> I get
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
> How can I cure it?
>
> Python 2.7
> libxml2-python 2.6.9
> 2.6.11-gentoo-r6
That version of libxml2 is way too old and doesn't support parsing
real-world HTML. IIRC, that started with 2.6.21 and got improved a bit
after that.
Get a 2.8.0 installation, as someone pointed out already.
Stefan
More information about the Python-list
mailing list