[lxml-dev] XMLFormatter revisited
Hi all, coming back to what I proposed a while ago, we currently have this: class XMLParser: def __init__(self, **options): self.options = options class HTMLParser: ... doc = ET.parse(source, parser=XMLParser(configuration)) At the time, I brought up an equivalent for output formatting: class XMLFormatter: ... That would give us a nice, symmetric API for input and output options. And it allows you to use sublasses to provide different formats: class XMLPrettyPrinter(XMLFormatter): def __init__(self): self._pretty_print = True class XHTMLFormatter(XMLFormatter): def __init__(self): self._xhtml = True xml = ET.tostring(element, formatter=XMLPrettyPrinter()) ET.parse("myfile.xhtml").write("out.xhtml", formatter=XHTMLFormatter()) After Noah's latest approach in that direction, I think we should adopt this API for 1.0. A version 1.0 is supposed to provide a stable and somewhat future-proof API, and adding keyword arguments to various API functions all the time is not quite what I call future-proof. One reason is that you can't test for the availability of keyword arguments, so adding features at that level is difficult to handle for code that wants to support them as an option. So, I propose to replace the current pretty_print keyword (which only appeared in the beta version anyway) with a new XMLPrettyPrinter class and to provide new features preferably at a subclass level of XMLFormatter (e.g. XHTMLFormatter, etc.). An alternative name would be XMLSerializer, maybe that's more general. We could also leave the pretty_print keyword in as a shortcut, but that would obviously make things a bit more complicated internally. Maybe pretty printing will just become a general keyword of the XMLFormatter class, I guess that would make sense. I will have to check how to make these classes nicely usable internally, but that's a minor problem. We already use a lot of xmlBuffer code, so that should give us a common ground for this API. If there are no objections, I'll start getting my hands on this next week, so if anyone has an opinion on this, please speak up soon. Stefan
Hi all, Stefan Behnel wrote:
class XMLParser: def __init__(self, **options): self.options = options
class XMLFormatter: ...
That would give us a nice, symmetric API for input and output options.
Oh, well. Just as usual, it's not as easy as it seems at first sight. I found that such an API does not work quite that well, neither for the integration with ET, nor with the implementation on top of libxml2. libxml2 supports the xmlSave... API, which has some nice features for formatting XML. However, it also has a number of bugs and some side effects that make it ugly to integrate into a nicely ET compatible API. One of these nice design decisions was to output a '\n' at the end of the XML output. While this is not too much of a problem when saving XML to a file, it is rather ugly when writing to StringIOs and strings, and it's unluckily non trivial to remove this character from an encoded string. The alternative would be to use an API that can't write XML declarations. Great! So I do not currently see a way to support both tostring() and ET.write() on top of the xmlSave* functions. However, I think it would still be nice to have an API that allows some more fine-tuning of the output, like character-entity conversion or hooks into the serialization process. For this, a separate class XMLSerializer looks like the way to go. I have implemented a simple incarnation of such a beast in the "xmlsave" branch, however, I'm not sufficiently satisfied with it to make it part of lxml 1.0. It's a separate feature anyway, so there is no hurry to integrate it. But if someone wants to take a look at it... Regards, Stefan
participants (1)
-
Stefan Behnel