Looking for source preservation features in XML libs

Fredrik Lundh fredrik at pythonware.com
Tue Dec 28 16:57:18 CET 2004


Grzegorz Adam Hankiewicz wrote:

> I'm looking for two specific features in XML libraries. One is two be
> able to tell which source file line a tag starts and ends. Say, tag
> <para> is located on line 34 column 7, and the matching </para> three
> lines later on column 56.
>
> Another feature is to be able to save the processed XML code in a way
> that unmodified tags preserve the original identation. Or in the worst
> case, all identation is lost, but I can control to some degree the
> outlook of the final XML output.
>
> I have looked at xml.minidom, elementtree and gnosis and haven found any
> such features. Are there libs providing these?

here's a custom parser that adds a "lineno" attribute to element nodes:

from elementtree import XMLTreeBuilder

class MyParser(XMLTreeBuilder.FancyTreeBuilder):
    def start(self, elem):
        elem.lineno = self.lineno

def parse(file):
    # feed one line at a time, and keep track of the line number
    lineno = 1
    parser = MyParser()
    for line in open(file).readlines():
        parser.lineno = lineno
        parser.feed(line)
        lineno = lineno + 1
    return parser.close()

for elem in parse("samples/simple.xml").getiterator():
    print elem.tag, elem.lineno

(the FancyTreeBuilder is somewhat broken in 1.2.1 through 1.2.3, at least
if you're using Python 2.3 or later.  or in other words, use ElementTree 1.2
or 1.2.4 if you want this to work).

the standard elementtree writer may modify the tags, but it preserves all
whitespace around them; depending on what you mean by "indentation",
that may or may not be what you want.  (but if you want to preserve all
whitespace in an XML document, you shouldn't run it through an XML
parser...)

</F> 






More information about the Python-list mailing list