[lxml-dev] Updated parser API
Hi, I updated the parser API according to the discussions (and the proposal of Fredrik) that we had in November. It now uses an XMLParser class that simply builds the libxml2 parse options in the constructor. I also added a global function "set_default_parser" that globally sets the default parser (options), or resets them if the supplied parser is None. Although the internal implementation may change later on, I think it is better to have this API in place *now* (i.e. for 0.9), so that we can simply add more features (i.e. keyword arguments) later on without changing the API itself. Since we already discussed this, I applied it directly to the trunk. Note, however, that currently not all parse options are backed by test cases. I added one that tests namespace stripping (in the new file test_parser.py), but considering the fact that most of the functionality is implemented entirely by libxml2, I (lazily) thought it's sufficient to test that the API works in general. Stefan
On 01.03.06 09:59:21, Stefan Behnel wrote:
I updated the parser API according to the discussions (and the proposal of Fredrik) that we had in November. It now uses an XMLParser class that simply builds the libxml2 parse options in the constructor. I also added a global function "set_default_parser" that globally sets the default parser (options), or resets them if the supplied parser is None.
Just a short question before I waste hours to try this out: Does this enable me to set "arbitrary" options on the XMLParser, so I could finally test the libxml-enhancement for removal of redundant namespaces (that is in CVS)? See http://bugzilla.gnome.org/show_bug.cgi?id=329347 for details. If the answer is yes, how would I specify the XML_DOM_RECONNS_REMOVEREDUND option? Andreas -- You had some happiness once, but your parents moved away, and you had to leave it behind.
Andreas Pakulat wrote:
On 01.03.06 09:59:21, Stefan Behnel wrote:
I updated the parser API according to the discussions (and the proposal of Fredrik) that we had in November. It now uses an XMLParser class that simply builds the libxml2 parse options in the constructor. I also added a global function "set_default_parser" that globally sets the default parser (options), or resets them if the supplied parser is None.
Just a short question before I waste hours to try this out:
Does this enable me to set "arbitrary" options on the XMLParser, so I could finally test the libxml-enhancement for removal of redundant namespaces (that is in CVS)? See http://bugzilla.gnome.org/show_bug.cgi?id=329347 for details.
Yes and no (my language has the wonderful word "jein" for that and I'd love to use it in english, too). Yes: It allows you to set options on the parser. No: The options must be available at compile time and mapped to keyword arguments by hand. Remember, we're talking about options of a C API here.
If the answer is yes, how would I specify the XML_DOM_RECONNS_REMOVEREDUND option?
This is not a parser option. I think it would rather be an option for a serializer, right? Maybe not even that since it modifies the state of the XML structure... So, no, there isn't currently an API for that. Maybe the best way to integrate the feature would be a method in ElementTree that explicitly traverses the tree to strip redundant declarations and possibly other relicts from copying elements. Something like class _ElementTree: ... def cleanup(self): # call libxml2 cleanup functions on self._context_node Since this is an experimental feature, it will not be supported in lxml 0.9 anyway. But if you could come up with a patch that implements it, it would allow us to integrate it later on and also help others who have the same problem and can afford to use a CVS version of libxml2. Stefan
On 04.03.06 12:51:45, Stefan Behnel wrote:
Andreas Pakulat wrote:
On 01.03.06 09:59:21, Stefan Behnel wrote:
I updated the parser API according to the discussions (and the proposal of Fredrik) that we had in November. It now uses an XMLParser class that simply builds the libxml2 parse options in the constructor. I also added a global function "set_default_parser" that globally sets the default parser (options), or resets them if the supplied parser is None.
Just a short question before I waste hours to try this out:
Does this enable me to set "arbitrary" options on the XMLParser, so I could finally test the libxml-enhancement for removal of redundant namespaces (that is in CVS)? See http://bugzilla.gnome.org/show_bug.cgi?id=329347 for details.
Yes and no (my language has the wonderful word "jein" for that and I'd love to use it in english, too).
:-) Tell me...
No: The options must be available at compile time and mapped to keyword arguments by hand. Remember, we're talking about options of a C API here.
Ok. As you've probably guessed from the question: I've no deep knowledge about how lxml "wraps" libxml or how libxml itself works/is used..
If the answer is yes, how would I specify the XML_DOM_RECONNS_REMOVEREDUND option?
This is not a parser option. I think it would rather be an option for a serializer, right? Maybe not even that since it modifies the state of the XML structure...
That may be, actually I have no idea. The implementation is in tree.c for libxml2.
Since this is an experimental feature, it will not be supported in lxml 0.9 anyway. But if you could come up with a patch that implements it, it would allow us to integrate it later on and also help others who have the same problem and can afford to use a CVS version of libxml2.
I'd really love to, but I don't have the time to do that any time soon, especially as I first need to understand how libxml2 works and then how lxml works and uses libxml2. Maybe I can start something in May..... Andreas -- Tomorrow will be cancelled due to lack of interest.
participants (2)
-
Andreas Pakulat
-
Stefan Behnel