Re: [lxml] Parsing HTML files with HTML entities
data:image/s3,"s3://crabby-images/94217/94217eff1909b2d16ed19864cb921cabafde9e99" alt=""
April 21, 2011
1:54 p.m.
On 20 apr 2011, at 18.51, Stefan Behnel wrote:
I can see the problem. I'm using lxml to manipulate my html document so the validation is not so important. So it would be great if LXML would not automatically strip all HTML entities if you do not load a DTD. But I can see that the solution now is to specify a DTD that defines all the HTML entites. But how can I specify a DTD and use the ther lxml.html.fragment_fromstring()? When I parse a fragment, the fragment does not have a doctype declared. Only my full html document has that. And I can't see a way of specifying a DTD in the parser options. Suggestions? Thanks for all your help! //Henrik
5057
Age (days ago)
5057
Last active (days ago)
0 comments
1 participants
participants (1)
-
Henrik