[lxml-dev] Any way to pass encoding to html.html_parser?

Hello. A simple question about lxml2.0alpha3's new feature.
* Parsers accept an 'encoding' keyword argument that overrides the encoding of the parsed documents.
How can I pass encoding argument to the parser when using html.parse instead of etree.parse?

js wrote:
Hmm, true, you can't currently do that, as lxml.html.html_parser is a parser instance, not a class. It's easy to build an equivalent parser, though. The next release will duplicate the parser class into lxml.html, until then, you can do this: class HTMLParser(lxml.etree.HTMLParser): def __init__(self, **kwargs): super(HTMLParser, self).__init__(**kwargs) self.setElementClassLookup(lxml.html.HtmlElementClassLookup()) Stefan

js wrote:
Hmm, true, you can't currently do that, as lxml.html.html_parser is a parser instance, not a class. It's easy to build an equivalent parser, though. The next release will duplicate the parser class into lxml.html, until then, you can do this: class HTMLParser(lxml.etree.HTMLParser): def __init__(self, **kwargs): super(HTMLParser, self).__init__(**kwargs) self.setElementClassLookup(lxml.html.HtmlElementClassLookup()) Stefan
participants (2)
-
js
-
Stefan Behnel