running xpath query on result of an xpath query gives results which were excluded by the first xpath query
Will start by saying I am sure this is documented somehow and I probably just don't know the right word to find that documentation. I'm trying to query a table from a document, after which I will query details from that table. The issue I have is that the second xpath query is pulling results from the entire document. As an example, I have: ``` from lxml import html exampledocument=""" <head> example </head> <body> <table><tbody> <tr><th>exampleCellNotToFind</th> </tbody></table> <table id="exampleTableToFind"><tbody> <tr><th>exampleCellToFind</th> </tbody></table> </body> """ example=html.fromstring(exampledocument) xpath1=example.xpath('//table[@id="exampleTableToFind"]', smart_strings=False) xpath2=xpath1[0].xpath('//table', smart_strings=False) xpath3=html.fromstring(html.tostring(xpath1[0])).xpath('//table', smart_strings=False) ``` In this case, xpath2 includes both tables (xpath1 only includes 1 table), and I don't understand why. From reading "XPath return values" section of https://lxml.de/xpathxslt.html#xpath I thought smart_strings=True would the reason, but setting it to false didn't seem to change the output as far as I can tell. I do have a workable solution in this html.fromstring(html.tostring()), but thought I should ask here to try to understand how these element type variables are working. Thanks, for any help.
Hi, paths starting with / always start from the document root. They are not relative to the current context element. What you probably want is ".//table" where the dot marks the current element as a search start. lord of edges via lxml - The Python XML Toolkit wrote (at 2026-02-16 15:15 -0000):
Will start by saying I am sure this is documented somehow and I probably just don't know the right word to find that documentation.
I'm trying to query a table from a document, after which I will query details from that table. The issue I have is that the second xpath query is pulling results from the entire document.
As an example, I have:
``` from lxml import html
exampledocument=""" <head> example </head> <body> <table><tbody> <tr><th>exampleCellNotToFind</th> </tbody></table> <table id="exampleTableToFind"><tbody> <tr><th>exampleCellToFind</th> </tbody></table> </body> """
example=html.fromstring(exampledocument)
xpath1=example.xpath('//table[@id="exampleTableToFind"]', smart_strings=False) xpath2=xpath1[0].xpath('//table', smart_strings=False) xpath3=html.fromstring(html.tostring(xpath1[0])).xpath('//table', smart_strings=False) ```
In this case, xpath2 includes both tables (xpath1 only includes 1 table), and I don't understand why. From reading "XPath return values" section of https://lxml.de/xpathxslt.html#xpath I thought smart_strings=True would the reason, but setting it to false didn't seem to change the output as far as I can tell.
I do have a workable solution in this html.fromstring(html.tostring()), but thought I should ask here to try to understand how these element type variables are working.
Thanks, for any help. _______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-leave@python.org https://mail.python.org/mailman3//lists/lxml.python.org Member address: listen@klawitter.de
You're not going to find it in the lxml documentation because it's basic xpath semantics, `/` at the start of the path means the path is absolute and thus the search is anchored to the document root: https://www.w3.org/TR/xpath-10/#location-paths:~:text=An%20absolute%20locati...
An absolute location path consists of / optionally followed by a relative location path. A / by itself selects the root node of the document containing the context node. If it is followed by a relative location path, then the location path selects the set of nodes that would be selected by the relative location path relative to the root node of the document containing the context node.
To search within the context node (the node from which you're invoking `xpath`) you can: - use `self::node()` or `.` (an abbreviation for the former) as the leading step - or follow the descendant (or descendant-or-self) axis explicitly e.g. `descendant::table` should have the same effect as `.//table` On 16/02/2026 16:15, lord of edges via lxml - The Python XML Toolkit wrote:
Will start by saying I am sure this is documented somehow and I probably just don't know the right word to find that documentation.
I'm trying to query a table from a document, after which I will query details from that table. The issue I have is that the second xpath query is pulling results from the entire document.
As an example, I have:
``` from lxml import html
exampledocument=""" <head> example </head> <body> <table><tbody> <tr><th>exampleCellNotToFind</th> </tbody></table> <table id="exampleTableToFind"><tbody> <tr><th>exampleCellToFind</th> </tbody></table> </body> """
example=html.fromstring(exampledocument)
xpath1=example.xpath('//table[@id="exampleTableToFind"]', smart_strings=False) xpath2=xpath1[0].xpath('//table', smart_strings=False) xpath3=html.fromstring(html.tostring(xpath1[0])).xpath('//table', smart_strings=False) ```
In this case, xpath2 includes both tables (xpath1 only includes 1 table), and I don't understand why. From reading "XPath return values" section of https://lxml.de/xpathxslt.html#xpath I thought smart_strings=True would the reason, but setting it to false didn't seem to change the output as far as I can tell.
I do have a workable solution in this html.fromstring(html.tostring()), but thought I should ask here to try to understand how these element type variables are working.
Thanks, for any help. _______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-leave@python.org https://mail.python.org/mailman3//lists/lxml.python.org Member address: xmo@odoo.com
Thanks, that makes a little bit more sense. I had understood that // was relative and / was absolute, but it seems to just be at least to me a really confusing description of what // is (for example at https://www.w3schools.com/xml/xpath_syntax.asp there is "Selects nodes in the document from the current node that match the selection no matter where they are", I'm not sure why it is relevant to select from the current node if you also select no matter where they are) Incase anyone else is dumb like me and manages to find this, I'll also mention that xpath1[0].xpath('.//table') doesn't return anything, but xpath1[0].xpath('.//tr') returns the one row that I want to see, while xpath1[0].xpath('//tr') returns the rows from both tables Google seems to think it is a stupid question, and I don't have a reason to need it now, but is there some way to exclude/delete elements/nodes outside the selection from the output/document?
On 17 Feb 2026, at 16:31, lord of edges via lxml - The Python XML Toolkit wrote:
Google seems to think it is a stupid question, and I don't have a reason to need it now, but is there some way to exclude/delete elements/nodes outside the selection from the output/document?
It's not really clear what you're trying to do but, when working with HTML, it's often best to use the BeautifulSoup library. This uses lxml internally but is more expressive than working XML directly, as in your use of xpath, especially with sequence indices. Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Sengelsweg 34 Düsseldorf D- 40489 Tel: +49-203-3925-0390 Mobile: +49-178-782-6226
participants (4)
-
Charlie Clark -
Holger Klawitter -
lord of edges -
Xavier Morel