htmlfile doesn't escape attribute values

Hello,
This is a heads-up for bug 1594155 here: https://bugs.launchpad.net/lxml/+bug/1594155
Please consider the following test case.
>>> from lxml import html >>> from lxml.etree import htmlfile >>> from lxml.html.builder import E >>> from StringIO import StringIO >>> out = StringIO() >>> with htmlfile(out) as f: ... with f.element("tagname", attrib={"attr": '"misquoted"'}): ... f.write("foo") ... >>> out.getvalue() '<tagname attr=""misquoted"">foo</tagname>'
Expected output:
'<tagname attr=""misquoted"">foo</tagname>'
Lack of proper escaping confuses the hell out of browsers :) Proper escaping is needed to safely put html documents inside srcdoc attribute of an <iframe>.
The workaround is to quote the data before putting it inside an attribute. Fortunately, only replacing " with " is enough and & is also not escaped (but it should be).
versions:
Python : sys.version_info(major=2, minor=7, micro=11, releaselevel='final', serial=0)
lxml.etree : (3, 6, 0, 0) libxml used : (2, 9, 3) libxml compiled : (2, 9, 3) libxslt used : (1, 1, 28) libxslt compiled : (1, 1, 28)
Best regards, Burak

Hello All,
On 06/22/16 13:32, Burak Arslan wrote:
Hello,
This is a heads-up for bug 1594155 here: https://bugs.launchpad.net/lxml/+bug/1594155
See https://github.com/lxml/lxml/pull/219 to follow up on this.
Best, Burak
participants (1)
-
Burak Arslan