Am Donnerstag, den 29.11.2007, 18:21 +0100 schrieb Artur Siekielski:
Yes, with h1 there is the same error. But I noticed that when I moved meta tag with charset declaration before <title>, then all parsing goes OK, including h1 tag. So it's libxml2 bug/limitation (I tried latest libxml2 from trunk and it's the same)?
I'm parsing 3rd party HTML, so I must find some workaround. Is this good solution: parse HTML, change elements sequence in <head>, serialiaze document and parse it again ?
No, I think the better way would be to parse it, look for the encoding (either by looking at <tree>.docinfo.encoding or looking for the meta-Tag with find()), and then reparse the unaltered document, now using the "encoding" keyword. This is what Stefan suggests: http://article.gmane.org/gmane.comp.python.lxml.devel/3001/ Cheers, Frederik