Hi Stefan,
thanks for your answer. :-)
On Thu, 21 May 2015 10:48:18 +0200
Stefan Behnel
[...]
When I run the above code, I get "4" as a result. This is a bit unexpected.
It seems, root.sourceline returns the line number where the start tag _ends_. However, I need to get the line number where <article> _starts_ (here in this example "2").
It seems that this behaviour applies only to the root element, though:
""" In [15]: source = '''<?xml version="1.0"?> ....:
http://docbook.org/ns/docbook" ....: xmlns:xlink="http://www.w3.org/1999/xlink"> ....: <title>... ....: </title> ....: <para> ....: ...</para> ....: </article> ....: ''' In [16]: root = etree.fromstring(source)
In [17]: print(root.sourceline) 4
In [18]: print(root[0].sourceline) 5
In [19]: print(root[1].sourceline) 7 """
Does this pose a problem in practice?
Well, yes, is. :) For example, I need to remove the whole prolog (XML declaration, DOCTYPE, and optional comments) of an XML file. I know, this sounds strange, but for the time being let's assume I have a valid reason. ;) To remove the prolog, my idea was to get the line number of the root's start-tag. With that information, I can strip the complete prolog. Unfortunately, it gives me the line number where it _ends_ which makes the start-tag syntactically incorrect. So my idea doesn't work. Maybe there is a better method to remove the prolog of an XML file, but I only found this one. Any idea? -- Gruß/Regards, Thomas Schraitle