Re: [lxml-dev] DOM2 range() support?
Gloria W wrote:
Stefan Behnel wrote:
This is actually the first time I come across the concept of DOM2 ranges. isn't that simply a tuple of two positions, where each position contains an Element and optionally one of the following:
- an attribute "{ns}name" and a string position in the attribute value - a string position in the text - a string position in the tail
Yes, I think so. The time consuming part of the implementation is that a range can access any position in any component of the DOM structure, without restriction. So this means accessing characters anywhere in the tag elements, values, etc. For example, the DOM range can begin at the 'r' and end at the 'x' in '<a href="x">'
Hmm, but honestly, that use pattern isn't very likely, right? I mean, I would consider putting the index /on/ an element (start and/or end) or somewhere into textual content to be the only useful patterns anyway. Wouldn't such an implementation be much simpler and still largely sufficient? You can always tell users to stick to that...
The DOM structure has to be able to be treated like a node structure and a string index simultaneously. It has to be able to interpret changes to the string and translate those into node changes, which is the very tedious, time-consuming part of implementing this feature.
Wait, does this mean adding 20 to the range index could let you end up on a different node? Say, switch from an element to an attribute, or maybe from an element to its closing parent? Who designs these things??? And: is a string index defined based on the original encoding or is it the unicode content of the infoset? (I *hope* the latter)
Admittedly, the DOM2 interface on top of that is a little more complex and (should I say it?) DOM-ishly obfuscated, but there shouldn't be more to it than that, right? I mean, the respective W3C spec is only some 13 sections long, there *can't* be more than that. :)
8) I would be so happy with just a fully functional range() implementation. I'm sure there's more to it, but developers can hopefully contribute as they need these features on the server side.
As I said, go for the common cases first.
Hmm, now that we have cool HTML support and CSS selection, I wouldn't mind having a Range class hanging around in some (lxml.range?) module. Doesn't even sound like you'd have to implement it in Pyrex, Python code should be enough here.
I think so as well. regex should be enough.
Well, what you describe above sounds more like I'd try to traverse the tree to get along, not the completely serialised representation, and then serialise each single element, attribute, etc. in order to see where we end up.
I would have liked to do it, but I don't have the spare time right now.
Well, that's usually a show-stopper in the OS world. It's either time or money (but at least it's not *always* money). Stefan
participants (1)
-
Stefan Behnel