Stefan Behnel wrote:
> I'd finally like to get my "bigfat.patch" a bit smaller. The main
> additions that are still pending are these four:
>
> * SAX I/O
Okay, starting to take a look at your SAX work now, giving my feedback,
thoughts and questions as I go.
I understand that you implemented a ElementTree builder that consumes
Python SAX events first, and then later on you created a Python-style
SAX event producer on top of the ElementTree API. Very cool! I have
quite a bit of SAX code lying around that may eventually benefit from this.
It'd be nice to eventually have a Python SAX producer on top of
libxml2's SAX2 APIs directly as well.
First a few small Python style issues I noticed:
* I want to stick with a character limit per line of 80 characters.
Longer lines should be broken up. If they can be broken 'naturally'
with a newline due to () and such that's preferable to the use of the
backslash.
* I see you paired for instance startElementNS and endElementNS together
without an empty line between them. I'd like to stick to empty lines
between all method definitions.
* I prefer the use of the unicode literal u'' to a barebones unicode()
call. I think it's a bit clearer to use the literal for this.
Now on to more details of the code, focusing on the SaxTreeBuilder class
for now (I'll look at saxify later).
* I presume that due to the namespace reconciliation when adding
elements repeating the ns_mapping each time you create an element is
safe and won't result in a lot of namespace definitions in the
serialized XML, correct?
* This code is *almost* compatible with ElementTree proper, except for
the namespaces bit. For future potential reusability it might be nicer
indeed to put it in a separate module instead of integrating it into
etree.pyx.
* Hm, wouldn't characters() be a bit nicer like this? (untested)
parent = self._element_stack[-1]
try:
# the text comes after another element, thus tail
element = parent[-1]
element.tail = (element.tail or u'') + data
except IndexError:
# the text is before any element in the parent, thus text
parent.text = (parent.text or u'') + data
* while this shouldn't matter normally, it might be nice to stick with
the argument names for the content handler defined in the Python
library, like name and qname.
* I prefer tuple unpacking of the 'tag_tuple' ('name'), like this. I
also don't think it's necessary to have a template for such a simple
thing as "{%s}%s"; repeating it a few times is actually less text than
using the template... No need to make it a unicode string either, though
it doesn't matter much in this case. (It would matter more for plain
ElementTree -- this stores ascii strings where possible, so *not*
forcing things into unicode might help it slightly, though I think SAX
delivers unicode always so it wouldn't matter after all..). Also look at
the bug I think I found (see comment).
ns_uri, localname = name
and then:
if ns_uri:
el_name = '{%s}%s' % (ns_uri, localname)
elif self._default_ns:
# order seems reversed in your code. Is this a bug or my
# misunderstanding? Would be nice to have a test which
# exercises this code!
el_name = '{%s}%s' % (self._default_ns, localname)
else:
el_name = localname
Anyway, going home now and not entirely done with the review yet. Hope
this was useful! Lots of this was about minor matters of taste and style
so far, but it's good to have a uniform style in lxml...
Regards,
Martijn