[lxml-dev] PATCH for formatting XML output

Hi, I'm a bit new to the project, but had a need to nicely format the output form lxml. Also, I have a need to leave the <?xml version="1.0"?> header at the top. So, I made a few changes to the write procedure. There are now two more arguments that can be called: format=0/1 (0 = default) whether or not the output should be pretty printed strip=0/1 (1 = default) whether or not the xml document definition should be stripped when using us-ascii or utf-8 encoding. I've attached the patch to this email. --Patrick

Patrick Wagstrom wrote:
Since this isn't part of the ElementTree API (which lxml is heading to conform to), I'd personally prefer having output formatters implemented in an external formatting class rather than _ElementTree. Something like PrettyPrint in xml.dom.ext. Others may have different opinions on this. Stefan

On Sun, 2005-10-30 at 14:40 +0100, Stefan Behnel wrote:
Once again, newbie disclaimer applies. I've done some digging on this, and all of the methods seem like they're going to require some sort of large performance hit in order to do, mainly because I'm going to be reimplementing a large portion of what libxml2 does underneath lxml to begin with. That's why I decided to add the extra two arguments to write, and make them optional, so in the most pure sense, compatability is maintained (ElementTree programs should work fine with lxml, not 100% the other way I guess). This made sense to me because I sorta saw lxml as a bit of a successor. I did however, see one bright spot. Apparently there may be a pretty printer in ElementTree at some point in the future as part of the ElementLib module. However, the last posting I can find relating to this is from March 2004[1], and I haven't been able to find where I can find the development version of ElementTree (maybe I'm just not looking in the right spot). If we're shooting for full compatibility, then lxml should follow the same syntax. Anyone know where I could find the proposed changes to ElementTree? Thanks! --Patrick [1] http://effbot.org/zone/element-lib.htm

Patrick Wagstrom wrote:
I didn't mean to use those classes, I was just commenting on the API. In the background, you'd obviously reuse what libxml2 has to offer.
Well, having given it a bit more thought, I don't oppose your way of adding it anymore. Since it's both backwards compatible and an obvious enhancement of the API, why not just add it? Still, one thing: do not use 0/1 for the format argument. That's C-ish. You're working on a Python API here, so make that True/False. And: please, add a test case in tests/test_etree.py !
This made sense to me because I sorta saw lxml as a bit of a successor.
Actually, the real successor is cElementTree. :)
Since this is pretty old and V1.3 still seems to be pretty far from the door, maybe you'd have to ask Fredrik Lundh to see how real this extension has become. Otherwise, just go with the keyword arguments. Stefan

Hey, I just read the interesting discussion in the thread. Thanks guys! Using keyword arguments (with True/False, or perhaps some status code if more options are possible seems like a reasonable approach. Perhaps the default signature should include: pretty_print=False and prologue=False or something like that. The whole prologue story is a bit messy in lxml by the way; I did some hackery to create ElementTree compatibility in not showing the prologue but perhaps we can do something saner than what I did... Regards, Martijn

Hi there, Thinking about this some more, it might be nice to get a document/doctest particularly about serializing XML in various ways, pretty printing, c18n, and so on, all in one place. This we can then include in the doc directory. Any volunteers to write a little story with examples? Regards, Martijn

Martijn Faassen wrote:
Now that you mention it: Some of the non output-related test cases look sub-optimal to me as they test for a specific XML output instead of specific properties of the result tree. These are easily broken when we start fiddeling around with the XML serialization - even without changing the parts they are supposed to test... I was too lazy to change them so far - my remaining patch is sufficiently big and conflict prone already. Maybe it's best to use reverse-test-driven development: If the test breaks while you were doing something unrelated, it's time to fix it. :) That's also the simplest way of determining the right person for the clean-up. Stefan :]

Patrick Wagstrom wrote:
Since this isn't part of the ElementTree API (which lxml is heading to conform to), I'd personally prefer having output formatters implemented in an external formatting class rather than _ElementTree. Something like PrettyPrint in xml.dom.ext. Others may have different opinions on this. Stefan

On Sun, 2005-10-30 at 14:40 +0100, Stefan Behnel wrote:
Once again, newbie disclaimer applies. I've done some digging on this, and all of the methods seem like they're going to require some sort of large performance hit in order to do, mainly because I'm going to be reimplementing a large portion of what libxml2 does underneath lxml to begin with. That's why I decided to add the extra two arguments to write, and make them optional, so in the most pure sense, compatability is maintained (ElementTree programs should work fine with lxml, not 100% the other way I guess). This made sense to me because I sorta saw lxml as a bit of a successor. I did however, see one bright spot. Apparently there may be a pretty printer in ElementTree at some point in the future as part of the ElementLib module. However, the last posting I can find relating to this is from March 2004[1], and I haven't been able to find where I can find the development version of ElementTree (maybe I'm just not looking in the right spot). If we're shooting for full compatibility, then lxml should follow the same syntax. Anyone know where I could find the proposed changes to ElementTree? Thanks! --Patrick [1] http://effbot.org/zone/element-lib.htm

Patrick Wagstrom wrote:
I didn't mean to use those classes, I was just commenting on the API. In the background, you'd obviously reuse what libxml2 has to offer.
Well, having given it a bit more thought, I don't oppose your way of adding it anymore. Since it's both backwards compatible and an obvious enhancement of the API, why not just add it? Still, one thing: do not use 0/1 for the format argument. That's C-ish. You're working on a Python API here, so make that True/False. And: please, add a test case in tests/test_etree.py !
This made sense to me because I sorta saw lxml as a bit of a successor.
Actually, the real successor is cElementTree. :)
Since this is pretty old and V1.3 still seems to be pretty far from the door, maybe you'd have to ask Fredrik Lundh to see how real this extension has become. Otherwise, just go with the keyword arguments. Stefan

Hey, I just read the interesting discussion in the thread. Thanks guys! Using keyword arguments (with True/False, or perhaps some status code if more options are possible seems like a reasonable approach. Perhaps the default signature should include: pretty_print=False and prologue=False or something like that. The whole prologue story is a bit messy in lxml by the way; I did some hackery to create ElementTree compatibility in not showing the prologue but perhaps we can do something saner than what I did... Regards, Martijn

Hi there, Thinking about this some more, it might be nice to get a document/doctest particularly about serializing XML in various ways, pretty printing, c18n, and so on, all in one place. This we can then include in the doc directory. Any volunteers to write a little story with examples? Regards, Martijn

Martijn Faassen wrote:
Now that you mention it: Some of the non output-related test cases look sub-optimal to me as they test for a specific XML output instead of specific properties of the result tree. These are easily broken when we start fiddeling around with the XML serialization - even without changing the parts they are supposed to test... I was too lazy to change them so far - my remaining patch is sufficiently big and conflict prone already. Maybe it's best to use reverse-test-driven development: If the test breaks while you were doing something unrelated, it's time to fix it. :) That's also the simplest way of determining the right person for the clean-up. Stefan :]
participants (3)
-
Martijn Faassen
-
Patrick Wagstrom
-
Stefan Behnel