Re: [lxml-dev] finding the line number of a parsed element

16 Mar 2007


      On Friday 16 March 2007 11:49, Stefan Behnel wrote:
...
Hi,
Stephan Richter wrote:
...
I have recently reimplemented RML (Reportlab's XML format to generate
PDFs) using lxml. All works well.
Interesting. Any chance you could provide a link?
Sure: http://svn.zope.org/z3c.rml/trunk/src/z3c/rml/
...
...
Now, I would like to give my users some more information when an error
occurs. For a pure XML parsing error, everything is fine (though I found
the failure points hard to interpret at times). But what if the XML
parses correctly, but while working with the element tree an error
occurs? In this case I would like to tell the user not only the error
message, but also the line/column and filename of point of failure.
This sounds a lot like a problem you could try to solve with validation.
No, I cannot, since some stuff cannot be decided until I do Python calls. For 
example, I look up colors by names, but this is not a static list.
...
...
Ideally I would have the filename, start row and start column of each
element available as part of the etree Element. I have tried to find this
information or hooks for it.unsuccessfully.
There is no API for it, but internally, we have this information for parsed
trees, at least the line number - note that exceptions contain the line
number already. So we could easily add a property "_line" to elements that
returns the line number at which the element was parsed (*if* it was
parsed). I don't like the fact so much that libxml2 puts a zero there if
the node was created by hand, but I assume that is not too much of a
problem either.
I think a zero is no problem. None would be better. :-)
...
I personally prefer "_line" over "line", as this only applies to parsed
elements, not all of them, so this is more of a half-working API.
That would be perfect.
...
Additionally, any additional attribute there goes off the list of children
accessible in objectify.
I don't understand this sentence. :-)
...
We could also consider adding an external utility module to provide helpers
like this that are not really worth poluting the API. Something like
lxml.tools.lineof(element)
That would be icing on the cake; either way is fine, If you consider such a 
tool, I would probably call it "parseInfo" or so, where maybe the filename, 
endline, and column info is available too.
...
Any comments?
How fast can you do this? :-)

Regards,
Stephan
-- 
Stephan Richter
CBU Physics & Chemistry (B.S.) / Tufts Physics (Ph.D. student)
Web2k - Web Software Design, Development and Training