[lxml-dev] advantages of libxml
Hi! Here are several features I miss in lxml: - ability to get namespace of the node without parsing .tag property (which is concatenated on library side, so double amount of meaningless work is done) - ability to get short name of the node - ability to get parent node without .xpath('../')[0] which seems overkill to me - ability to get absolute xpath of node I may be wrong and some things that I want may be inconsistent with lxml design and usage patterns, correct me in this case. Anyway lxml is the best xml-processing library for Python, thanks for your work!
Slou wrote:
Here are several features I miss in lxml:
- ability to get namespace of the node without parsing .tag property
(which is concatenated on library side, so double amount of meaningless work is done)
There's a 'prefix' property now in the trunk version. It should be easy enough to add a namespace URI property as well.
- ability to get short name of the node
This would be the local name, right? Shouldn't be too hard to add a localname property as well.
- ability to get parent node without .xpath('../')[0] which seems overkill to me
Yeah, though I'm a bit wary of extending the ElementTree API so fundamentally. Since we have the ability with libxml2, we should exploit it though, I guess.
- ability to get absolute xpath of node
This one I need to think about; I believe libxml2 has a facility for this, but research would need to be done. If you can find out the API in libxml2 and submit a patch, that'd be great!
I may be wrong and some things that I want may be inconsistent with lxml design and usage patterns, correct me in this case.
Anyway lxml is the best xml-processing library for Python, thanks for your work!
Thanks for the feedback! I'll consider implementing your suggestions. It shouldn't be too hard to implement a bunch of read-only properties for this. I'm wary to make them writeable, as that might involve DOM-like complexity, but read-only should be simple. The 'getting the xpath expression of a node' requires some more puzzling though; let's talk about this more. Regards, Martijn
Thanks for your response )
- ability to get parent node without .xpath('../')[0] which seems overkill to me Yeah, though I'm a bit wary of extending the ElementTree API so fundamentally. Since we have the ability with libxml2, we should exploit it though, I guess.
ElementTree does not have it cause one element could be subelement of different parents by implementation I guess. libxml2 has strict limitation in that case. may be we can add .find('..') as alternative? (although I would not need that functionality in case we have ability to get absolute xpath)
- ability to get absolute xpath of node This one I need to think about; I believe libxml2 has a facility for this, but research would need to be done. If you can find out the API in libxml2 and submit a patch, that'd be great!
afaik there is no one-api-function-call way to do it. but I need that strongly and I would try to implement it anyway. may be you can give some tips on it?
Hi, On Tue, 2005-07-05 at 14:26 +0400, Slou wrote:
- ability to get absolute xpath of node This one I need to think about; I believe libxml2 has a facility for this, but research would need to be done. If you can find out the API in libxml2 and submit a patch, that'd be great!
afaik there is no one-api-function-call way to do it. but I need that strongly and I would try to implement it anyway.
may be you can give some tips on it?
Something like this (pseudo code, attrs not handled): xpath = "" pos = 1 while node != NULL and node != document: if node.prev == NULL: if xpath != "": xpath = "/" + xpath xpath = "node()[pos]" + xpath node = node.parent pos = 1 else: pos = pos + 1 node = node.prev if xpath != "": xpath = "/" + xpath Regards, Kasimier
Kasimier Buchcik wrote:
xpath = "" pos = 1 while node != NULL and node != document: if node.prev == NULL: if xpath != "": xpath = "/" + xpath xpath = "node()[pos]" + xpath node = node.parent pos = 1 else: pos = pos + 1 node = node.prev if xpath != "": xpath = "/" + xpath
Maybe you could cache it as an Element property, so that this path is only computed on the first access ? Best, -- Olivier
Hi, On Tue, 2005-07-05 at 14:22 +0200, Olivier Grisel wrote:
Kasimier Buchcik wrote:
xpath = "" pos = 1 while node != NULL and node != document: if node.prev == NULL: if xpath != "": xpath = "/" + xpath xpath = "node()[pos]" + xpath node = node.parent pos = 1 else: pos = pos + 1 node = node.prev if xpath != "": xpath = "/" + xpath
Maybe you could cache it as an Element property, so that this path is only computed on the first access ?
You would need to invalidate such a cache for every change in the "preceding" axis. Regards, Kasimier
afaik there is no one-api-function-call way to do it. but I need that strongly and I would try to implement it anyway.
may be you can give some tips on it?
Something like this (pseudo code, attrs not handled):
xpath = "" pos = 1 while node != NULL and node != document: if node.prev == NULL: if xpath != "": xpath = "/" + xpath xpath = "node()[pos]" + xpath node = node.parent pos = 1 else: pos = pos + 1 node = node.prev if xpath != "": xpath = "/" + xpath
Thanks, I understand it on this level. The question was about tricks that could be done using existing codebase. May be (but unlikely) there is property for position of element or something similar.
participants (4)
-
Kasimier Buchcik
-
Martijn Faassen
-
Olivier Grisel
-
Slou