29 Nov
2007
29 Nov
'07
7:56 p.m.
Am Donnerstag, den 29.11.2007, 19:41 +0100 schrieb Artur Siekielski:
No, I think the better way would be to parse it, look for the encoding (either by looking at <tree>.docinfo.encoding or looking for the meta-Tag with find()), and then reparse the unaltered document, now using the "encoding" keyword. This is what Stefan suggests: http://article.gmane.org/gmane.comp.python.lxml.devel/3001/
Hi, thanks for suggestion. But how can I pass the "encoding" keyword? Neither etree.parse nor etree.HTMLParser supports it.
Oh, I'm sorry. This is only supported by the alpha of lxml 2.0. Simply overlooked that. So for the time being, serialisation and reparsing might be the best option, but I haven't tried that. Cheers, Frederik