htmlfile doesn't escape attribute values
Hello, This is a heads-up for bug 1594155 here: https://bugs.launchpad.net/lxml/+bug/1594155 Please consider the following test case. >>> from lxml import html >>> from lxml.etree import htmlfile >>> from lxml.html.builder import E >>> from StringIO import StringIO >>> out = StringIO() >>> with htmlfile(out) as f: ... with f.element("tagname", attrib={"attr": '"misquoted"'}): ... f.write("foo") ... >>> out.getvalue() '<tagname attr=""misquoted"">foo</tagname>' Expected output: '<tagname attr=""misquoted"">foo</tagname>' Lack of proper escaping confuses the hell out of browsers :) Proper escaping is needed to safely put html documents inside srcdoc attribute of an <iframe>. The workaround is to quote the data before putting it inside an attribute. Fortunately, only replacing " with " is enough and & is also not escaped (but it should be). versions: Python : sys.version_info(major=2, minor=7, micro=11, releaselevel='final', serial=0) lxml.etree : (3, 6, 0, 0) libxml used : (2, 9, 3) libxml compiled : (2, 9, 3) libxslt used : (1, 1, 28) libxslt compiled : (1, 1, 28) Best regards, Burak
Hello All, On 06/22/16 13:32, Burak Arslan wrote:
Hello,
This is a heads-up for bug 1594155 here: https://bugs.launchpad.net/lxml/+bug/1594155
See https://github.com/lxml/lxml/pull/219 to follow up on this. Best, Burak
participants (1)
-
Burak Arslan