Stefan Behnel wrote:
I'd finally like to get my "bigfat.patch" a bit smaller. The main additions that are still pending are these four:
* SAX I/O
Okay, starting to take a look at your SAX work now, giving my feedback, thoughts and questions as I go. I understand that you implemented a ElementTree builder that consumes Python SAX events first, and then later on you created a Python-style SAX event producer on top of the ElementTree API. Very cool! I have quite a bit of SAX code lying around that may eventually benefit from this. It'd be nice to eventually have a Python SAX producer on top of libxml2's SAX2 APIs directly as well. First a few small Python style issues I noticed: * I want to stick with a character limit per line of 80 characters. Longer lines should be broken up. If they can be broken 'naturally' with a newline due to () and such that's preferable to the use of the backslash. * I see you paired for instance startElementNS and endElementNS together without an empty line between them. I'd like to stick to empty lines between all method definitions. * I prefer the use of the unicode literal u'' to a barebones unicode() call. I think it's a bit clearer to use the literal for this. Now on to more details of the code, focusing on the SaxTreeBuilder class for now (I'll look at saxify later). * I presume that due to the namespace reconciliation when adding elements repeating the ns_mapping each time you create an element is safe and won't result in a lot of namespace definitions in the serialized XML, correct? * This code is *almost* compatible with ElementTree proper, except for the namespaces bit. For future potential reusability it might be nicer indeed to put it in a separate module instead of integrating it into etree.pyx. * Hm, wouldn't characters() be a bit nicer like this? (untested) parent = self._element_stack[-1] try: # the text comes after another element, thus tail element = parent[-1] element.tail = (element.tail or u'') + data except IndexError: # the text is before any element in the parent, thus text parent.text = (parent.text or u'') + data * while this shouldn't matter normally, it might be nice to stick with the argument names for the content handler defined in the Python library, like name and qname. * I prefer tuple unpacking of the 'tag_tuple' ('name'), like this. I also don't think it's necessary to have a template for such a simple thing as "{%s}%s"; repeating it a few times is actually less text than using the template... No need to make it a unicode string either, though it doesn't matter much in this case. (It would matter more for plain ElementTree -- this stores ascii strings where possible, so *not* forcing things into unicode might help it slightly, though I think SAX delivers unicode always so it wouldn't matter after all..). Also look at the bug I think I found (see comment). ns_uri, localname = name and then: if ns_uri: el_name = '{%s}%s' % (ns_uri, localname) elif self._default_ns: # order seems reversed in your code. Is this a bug or my # misunderstanding? Would be nice to have a test which # exercises this code! el_name = '{%s}%s' % (self._default_ns, localname) else: el_name = localname Anyway, going home now and not entirely done with the review yet. Hope this was useful! Lots of this was about minor matters of taste and style so far, but it's good to have a uniform style in lxml... Regards, Martijn