Mailman 3 [lxml-dev] advantages of libxml - lxml - The Python XML Toolkit

newer
[lxml-dev] lxml on Windows - hints?

[lxml-dev] advantages of libxml

Slou

4 Jul 2005 4 Jul '05

6:49 a.m.

Hi! Here are several features I miss in lxml: - ability to get namespace of the node without parsing .tag property (which is concatenated on library side, so double amount of meaningless work is done) - ability to get short name of the node - ability to get parent node without .xpath('../')[0] which seems overkill to me - ability to get absolute xpath of node I may be wrong and some things that I want may be inconsistent with lxml design and usage patterns, correct me in this case. Anyway lxml is the best xml-processing library for Python, thanks for your work!

Attachments:

attachment.htm (text/html — 990 bytes)

Show replies by date

Martijn Faassen

4 Jul 4 Jul

8:19 a.m.

Slou wrote:

...

Here are several features I miss in lxml:

- ability to get namespace of the node without parsing .tag property

...

(which is concatenated on library side, so double amount of meaningless work is done)

There's a 'prefix' property now in the trunk version. It should be easy enough to add a namespace URI property as well.

...

- ability to get short name of the node

This would be the local name, right? Shouldn't be too hard to add a localname property as well.

...

- ability to get parent node without .xpath('../')[0] which seems overkill to me

Yeah, though I'm a bit wary of extending the ElementTree API so fundamentally. Since we have the ability with libxml2, we should exploit it though, I guess.

...

- ability to get absolute xpath of node

This one I need to think about; I believe libxml2 has a facility for this, but research would need to be done. If you can find out the API in libxml2 and submit a patch, that'd be great!

...

I may be wrong and some things that I want may be inconsistent with lxml design and usage patterns, correct me in this case.

Anyway lxml is the best xml-processing library for Python, thanks for your work!

Thanks for the feedback! I'll consider implementing your suggestions. It shouldn't be too hard to implement a bunch of read-only properties for this. I'm wary to make them writeable, as that might involve DOM-like complexity, but read-only should be simple. The 'getting the xpath expression of a node' requires some more puzzling though; let's talk about this more. Regards, Martijn

Slou

5 Jul 5 Jul

3:26 a.m.

Thanks for your response )

...

...
- ability to get parent node without .xpath('../')[0] which seems overkill to me Yeah, though I'm a bit wary of extending the ElementTree API so fundamentally. Since we have the ability with libxml2, we should exploit it though, I guess.

ElementTree does not have it cause one element could be subelement of different parents by implementation I guess. libxml2 has strict limitation in that case. may be we can add .find('..') as alternative? (although I would not need that functionality in case we have ability to get absolute xpath)

...

...
- ability to get absolute xpath of node This one I need to think about; I believe libxml2 has a facility for this, but research would need to be done. If you can find out the API in libxml2 and submit a patch, that'd be great!

afaik there is no one-api-function-call way to do it. but I need that strongly and I would try to implement it anyway. may be you can give some tips on it?

Kasimier Buchcik

3:56 a.m.

Hi, On Tue, 2005-07-05 at 14:26 +0400, Slou wrote:

...

...
...
- ability to get absolute xpath of node This one I need to think about; I believe libxml2 has a facility for this, but research would need to be done. If you can find out the API in libxml2 and submit a patch, that'd be great!

afaik there is no one-api-function-call way to do it. but I need that strongly and I would try to implement it anyway.

may be you can give some tips on it?

Something like this (pseudo code, attrs not handled): xpath = "" pos = 1 while node != NULL and node != document: if node.prev == NULL: if xpath != "": xpath = "/" + xpath xpath = "node()[pos]" + xpath node = node.parent pos = 1 else: pos = pos + 1 node = node.prev if xpath != "": xpath = "/" + xpath Regards, Kasimier

Olivier Grisel

5:22 a.m.

New subject: [lxml-dev] Re: advantages of libxml

Kasimier Buchcik wrote:

...

xpath = "" pos = 1 while node != NULL and node != document: if node.prev == NULL: if xpath != "": xpath = "/" + xpath xpath = "node()[pos]" + xpath node = node.parent pos = 1 else: pos = pos + 1 node = node.prev if xpath != "": xpath = "/" + xpath

Maybe you could cache it as an Element property, so that this path is only computed on the first access ? Best, -- Olivier

Kasimier Buchcik

5:31 a.m.

New subject: [lxml-dev] Re: advantages of libxml

Hi, On Tue, 2005-07-05 at 14:22 +0200, Olivier Grisel wrote:

...

Kasimier Buchcik wrote:

...
xpath = "" pos = 1 while node != NULL and node != document: if node.prev == NULL: if xpath != "": xpath = "/" + xpath xpath = "node()[pos]" + xpath node = node.parent pos = 1 else: pos = pos + 1 node = node.prev if xpath != "": xpath = "/" + xpath

Maybe you could cache it as an Element property, so that this path is only computed on the first access ?

You would need to invalidate such a cache for every change in the "preceding" axis. Regards, Kasimier

Slou

5:28 a.m.

...

...
afaik there is no one-api-function-call way to do it. but I need that strongly and I would try to implement it anyway.

may be you can give some tips on it?

Something like this (pseudo code, attrs not handled):

xpath = "" pos = 1 while node != NULL and node != document: if node.prev == NULL: if xpath != "": xpath = "/" + xpath xpath = "node()[pos]" + xpath node = node.parent pos = 1 else: pos = pos + 1 node = node.prev if xpath != "": xpath = "/" + xpath

Thanks, I understand it on this level. The question was about tricks that could be done using existing codebase. May be (but unlikely) there is property for position of element or something similar.

6867

Age (days ago)

6868

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Kasimier Buchcik
Martijn Faassen
Olivier Grisel
Slou

[lxml-dev] advantages of libxml

Slou

Martijn Faassen

Slou

Kasimier Buchcik

Olivier Grisel

Kasimier Buchcik

Slou

tags

participants (4)