[lxml-dev] Any way to pass encoding to html.html_parser?
data:image/s3,"s3://crabby-images/27986/27986c14f93cafcabfb9468897d664d878b8021e" alt=""
Hello. A simple question about lxml2.0alpha3's new feature.
* Parsers accept an 'encoding' keyword argument that overrides the encoding of the parsed documents.
How can I pass encoding argument to the parser when using html.parse instead of etree.parse?
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
js wrote:
Hmm, true, you can't currently do that, as lxml.html.html_parser is a parser instance, not a class. It's easy to build an equivalent parser, though. The next release will duplicate the parser class into lxml.html, until then, you can do this: class HTMLParser(lxml.etree.HTMLParser): def __init__(self, **kwargs): super(HTMLParser, self).__init__(**kwargs) self.setElementClassLookup(lxml.html.HtmlElementClassLookup()) Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
js wrote:
Hmm, true, you can't currently do that, as lxml.html.html_parser is a parser instance, not a class. It's easy to build an equivalent parser, though. The next release will duplicate the parser class into lxml.html, until then, you can do this: class HTMLParser(lxml.etree.HTMLParser): def __init__(self, **kwargs): super(HTMLParser, self).__init__(**kwargs) self.setElementClassLookup(lxml.html.HtmlElementClassLookup()) Stefan
participants (2)
-
js
-
Stefan Behnel