Stephan Richter wrote:
Ideally I would have the filename, start row and start column of each element available as part of the etree Element. I have tried to find this information or hooks for it.unsuccessfully.
There is no API for it, but internally, we have this information for parsed trees, at least the line number - note that exceptions contain the line number already. So we could easily add a property "_line" to elements that returns the line number at which the element was parsed (*if* it was parsed). I don't like the fact so much that libxml2 puts a zero there if the node was created by hand, but I assume that is not too much of a problem either.
I think a zero is no problem. None would be better. :-)
Problem is: how would you distinguish 'parsed in line 0' from 'not parsed at all' in this case?
Additionally, any additional attribute there goes off the list of children accessible in objectify.
I don't understand this sentence. :-)
I was talking about lxml.objectify that uses Python object attributes to access XML element children (sort of like data binding to an object tree). Every name that is used as a Python attribute of the _Element class shadows XML children that would otherwise be accessible under that name. Check out the objectify docs to see what I mean.
We could also consider adding an external utility module to provide helpers like this that are not really worth poluting the API. Something like
That would be icing on the cake; either way is fine, If you consider such a tool, I would probably call it "parseInfo" or so, where maybe the filename, endline, and column info is available too.
The filename would be available from documents, I don't know what you mean with "endline" (the last line number?) and the parser column is not available from libxml2 (at least not once the parser has passed the element...)
So, what about an 'lxml.docinfo' module then that provides this kind of info helper functions? I was never really happy with the DocInfo class, so it might be a good idea to just move this kind of information to a separate module that people can use if they need it.
I'm pretty confident that there is even more that we could provide at that level. And it would help us in keeping the already bigger-than-big-enough API of lxml at least a little smaller.