Mailman 3 generating html with incremental writer - lxml - The Python XML Toolkit

newer
How to use result of nsmap in xpath

generating html with incremental writer

Burak Arslan

23 Dec 2013 23 Dec '13

2:56 a.m.

Hi, Is there a way to pass method='html' to etree.xmlfile()? Or any other way to serialize incrementally to html? Best, Burak

Show replies by date

Stefan Behnel

23 Dec 23 Dec

3:01 a.m.

Burak Arslan, 23.12.2013 09:56:

...

Is there a way to pass method='html' to etree.xmlfile()? Or any other way to serialize incrementally to html?

No, not currently. (Would be a new feature, maybe "htmlfile()".) You could generate XHTML, though. Stefan

Burak Arslan

5:40 a.m.

On 12/23/13 11:01, Stefan Behnel wrote:

...

Burak Arslan, 23.12.2013 09:56:

...
Is there a way to pass method='html' to etree.xmlfile()? Or any other way to serialize incrementally to html? No, not currently. (Would be a new feature, maybe "htmlfile()".)

You could generate XHTML, though.

I'm not too interested in XHTML, but I could work on implementing support for a "method" argument to etree.xmlfile (or write a new html.htmlfile, whichever you deem more appropriate). Did you actually re-implement bits of xmlNodeDumpOutput in xmlfile in a context-manager-friendly way? If so, should I start by looking at htmlNodeDumpFormatOutput from libxml? Do you think htmlfile should be a separate function or just a wrapper around xmlfile? I see most of lxml.html is a wrapper around similar calls in lxml.etree but htmlfile could be different enough to merit being a separate function. Any pointers/advice about getting this to work would be appreciated. Best, Burak

Stefan Behnel

9:51 a.m.

Burak Arslan, 23.12.2013 12:40:

...

On 12/23/13 11:01, Stefan Behnel wrote:

...
Burak Arslan, 23.12.2013 09:56:

...
Is there a way to pass method='html' to etree.xmlfile()? Or any other way to serialize incrementally to html? No, not currently. (Would be a new feature, maybe "htmlfile()".)

You could generate XHTML, though.

I'm not too interested in XHTML, but I could work on implementing support for a "method" argument to etree.xmlfile (or write a new html.htmlfile, whichever you deem more appropriate).

Sure. I'd prefer having an etree.htmlfile() class. The "xmlfile" class itself is fairly short anyway, it's purely an API class. The real work is done by the _IncrementalFileWriter. I suggest you add a "method" argument to the latter and pass it either OUTPUT_METHOD_XML or OUTPUT_METHOD_HTML from the two frontends. Take care to disallow namespaces in HTML mode. Also, I'm not sure if there is anything to do about self-closing tags in HTML mode. I guess if people use the context manager to create them, they may just have to live with them being split into opening and closing tags ... I guess write_declaration() should raise an error in HTML mode. The rest might even work more or less as it is. Most of the code you need to write might actually end up in tests.

...

Did you actually re-implement bits of xmlNodeDumpOutput in xmlfile in a context-manager-friendly way?

Yes, necessarily. lxml actually replicates a fair bit of libxml2's functionality where the latter doesn't fit it's API well enough (or where lxml can do things more efficiently).

...

If so, should I start by looking at htmlNodeDumpFormatOutput from libxml?

No, _writeNodeToBuffer() handles this just fine when you pass the right method value. Stefan

Burak Arslan

15 Sep 15 Sep

6:56 a.m.

Hello Stefan, I just sent you a pull request: https://github.com/lxml/lxml/pull/142 On 12/23/13 17:51, Stefan Behnel wrote:

...

Burak Arslan, 23.12.2013 12:40:

...
...
Burak Arslan, 23.12.2013 09:56:

...
Is there a way to pass method='html' to etree.xmlfile()? Or any other way to serialize incrementally to html? No, not currently. (Would be a new feature, maybe "htmlfile()".)

You could generate XHTML, though. I'm not too interested in XHTML, but I could work on implementing support for a "method" argument to etree.xmlfile (or write a new

On 12/23/13 11:01, Stefan Behnel wrote: html.htmlfile, whichever you deem more appropriate). Sure.

I'd prefer having an etree.htmlfile() class. The "xmlfile" class itself is fairly short anyway, it's purely an API class. The real work is done by the _IncrementalFileWriter. I suggest you add a "method" argument to the latter and pass it either OUTPUT_METHOD_XML or OUTPUT_METHOD_HTML from the two frontends.

Done.

...

Take care to disallow namespaces in HTML mode. Also, I'm not sure if there is anything to do about self-closing tags in HTML mode. I guess if people use the context manager to create them, they may just have to live with them being split into opening and closing tags ...

Nowadays html markup is abused for all sorts of javascript-y reasons, so I left it as it is. I also noticed that html.tostring doesn't suppress namespaces, but my patch does. Should I alter that behaviour?

...

I guess write_declaration() should raise an error in HTML mode.

Done.

...

The rest might even work more or less as it is. Most of the code you need to write might actually end up in tests.

It indeed seems to. Could you advise what kind of tests you think this code needs? Best regards, Burak

Stefan Behnel

7:21 a.m.

Burak Arslan schrieb am 15.09.2014 um 13:56:

...

I just sent you a pull request: https://github.com/lxml/lxml/pull/142

Thanks!

...

On 12/23/13 17:51, Stefan Behnel wrote:

...
Burak Arslan, 23.12.2013 12:40:

...
...
Burak Arslan, 23.12.2013 09:56:

...
Is there a way to pass method='html' to etree.xmlfile()? Or any other way to serialize incrementally to html? No, not currently. (Would be a new feature, maybe "htmlfile()".)

You could generate XHTML, though. I'm not too interested in XHTML, but I could work on implementing support for a "method" argument to etree.xmlfile (or write a new

On 12/23/13 11:01, Stefan Behnel wrote: html.htmlfile, whichever you deem more appropriate). Sure.

I'd prefer having an etree.htmlfile() class. The "xmlfile" class itself is fairly short anyway, it's purely an API class. The real work is done by the _IncrementalFileWriter. I suggest you add a "method" argument to the latter and pass it either OUTPUT_METHOD_XML or OUTPUT_METHOD_HTML from the two frontends.

Done.

See my comments in the pull request.

...

...
Take care to disallow namespaces in HTML mode. Also, I'm not sure if there is anything to do about self-closing tags in HTML mode. I guess if people use the context manager to create them, they may just have to live with them being split into opening and closing tags ...

Nowadays html markup is abused for all sorts of javascript-y reasons, so I left it as it is.

I also noticed that html.tostring doesn't suppress namespaces, but my patch does. Should I alter that behaviour?

I was referring to the xf.element() calls (sorry). If people write out subtrees as HTML that use namespaces somewhere, I think they're pretty much on their own. However, I think it should be an error if you try that directly with xf.element() in HTML mode.

...

...
The rest might even work more or less as it is. Most of the code you need to write might actually end up in tests.

It indeed seems to. Could you advise what kind of tests you think this code needs?

There are a couple of tests for xmlfile(), so the bulk of the code is already tested. What still needs testing is the HTML specific parts, i.e. any changes or additions to the API (including error cases), as well as the actual HTML specific serialisation (i.e. that you actually get what you asked for). Stefan

Burak Arslan

16 Sep 16 Sep

3:29 a.m.

you were not supposed to merge this patch so soon :) there are two unresolved issues, one was posted in another thread, the other is this one, directly related. On 09/15/14 15:21, Stefan Behnel wrote:

...

...
I also noticed that html.tostring doesn't suppress namespaces, but my

...
patch does. Should I alter that behaviour? I was referring to the xf.element() calls (sorry). If people write out subtrees as HTML that use namespaces somewhere, I think they're pretty much on their own.

However, I think it should be an error if you try that directly with xf.element() in HTML mode.

with etree.htmlfile(self._file) as xf: xf.write(etree.Element('{some_ns}some_tag')) doesnt suppress namespaces with etree.htmlfile(self._file) as xf: with xf.element("{some_ns}some_tag"): pass does suppress namespaces. this is inconsistent and needs to be fixed. see: https://github.com/plq/lxml/commit/378408d2b6e94a4c91410fc7bde5bba055f54785 You have three options: 1) silently filter namespaces out 2) throwing an exception when using html serialization with namespaced elements 3) let namespaces pass I'd choose 1 to make the lives of people who generate xhtml and html with the same code easier. It's your decision though. Best, Burak

Stefan Behnel

10:55 a.m.

Burak Arslan schrieb am 16.09.2014 um 10:29:

...

you were not supposed to merge this patch so soon :)

It looked ok, though. :)

...

On 09/15/14 15:21, Stefan Behnel wrote:

...
...
I also noticed that html.tostring doesn't suppress namespaces, but my

...
patch does. Should I alter that behaviour? I was referring to the xf.element() calls (sorry). If people write out subtrees as HTML that use namespaces somewhere, I think they're pretty much on their own.

However, I think it should be an error if you try that directly with xf.element() in HTML mode.

with etree.htmlfile(self._file) as xf: xf.write(etree.Element('{some_ns}some_tag'))

doesnt suppress namespaces

with etree.htmlfile(self._file) as xf: with xf.element("{some_ns}some_tag"): pass

does suppress namespaces. this is inconsistent

True.

...

and needs to be fixed.

Hmm, does it? What about this case: plain_p = etree.Element('p') etree.SubElement(plain_p, '{some_ns}some_tag') with etree.htmlfile(self._file) as xf: xf.write(plain_p) Namespaces can be used at any place when writing out subtrees. I wouldn't want to validate the entire tree before writing it out.

...

see: https://github.com/plq/lxml/commit/378408d2b6e94a4c91410fc7bde5bba055f54785

You have three options:

1) silently filter namespaces out 2) throwing an exception when using html serialization with namespaced elements 3) let namespaces pass

I'd choose 1 to make the lives of people who generate xhtml and html with the same code easier.

ISTM that 3) is the simpler and more obvious option, i.e. let libxml2 handle it. 1) would require running through the entire subtree, and if we find (XHTML) namespaces, make a copy of the subtree, remove the namespaces from the copy, and serialise it. That sounds like more than we should impose on users behind their back. Stefan

Burak Arslan

18 Sep 18 Sep

2:13 p.m.

Hello, On 09/16/14 18:55, Stefan Behnel wrote:

...

...
You have three options:

...
1) silently filter namespaces out 2) throwing an exception when using html serialization with namespaced elements 3) let namespaces pass

I'd choose 1 to make the lives of people who generate xhtml and html with the same code easier.

ISTM that 3) is the simpler and more obvious option, i.e. let libxml2 handle it.

done. see: https://github.com/plq/lxml/compare/lxml:master...master for your convenience: git remote add plq git://github.com/plq/lxml git fetch plq git merge plq/master also, don't forget about void elements. I know they are rare ones, but they are in the spec. best, burak

Stefan Behnel

25 Sep 25 Sep

11:59 a.m.

Burak Arslan schrieb am 18.09.2014 um 21:13:

...

On 09/16/14 18:55, Stefan Behnel wrote:

...
...
You have three options:

...
1) silently filter namespaces out 2) throwing an exception when using html serialization with namespaced elements 3) let namespaces pass

I'd choose 1 to make the lives of people who generate xhtml and html with the same code easier.

ISTM that 3) is the simpler and more obvious option, i.e. let libxml2 handle it.

done. see: https://github.com/plq/lxml/compare/lxml:master...master

Thanks. I applied the changes manually and moved them around a bit. Stefan

3697

Age (days ago)

3973

Last active (days ago)

List overview

Download

9 comments

2 participants

participants (2)

Burak Arslan
Stefan Behnel