xml.etree.ElementTree.IncrementalParser (was: ElementTree iterparse string)
[from python-ideas] Antoine Pitrou, 07.08.2013 08:04:
Take a look at IncrementalParser: http://docs.python.org/dev/library/xml.etree.elementtree.html#incremental-pa...
Hmm, that seems to be a somewhat recent addition (April 2013). I would have preferred hearing about it before it got added. I don't like the fact that it adds a second interface to iterparse() that allows injecting arbitrary content into the parser. You can now run iterparse() to read from a file, and at an arbitrary iteration position, send it a byte string to parse from, before it goes reading more data from the file. Or take out some events before iteration continues. I think the implementation should be changed to make iterparse() return something that wraps an IncrementalParser, not something that is an IncrementalParser. Also, IMO it should mimic the interface of the TreeBuilder, which calls the data reception method "data()" and the termination method "close()". There is no reason to add yet another set of methods names just to do what others do already. Stefan
Hi, Le Thu, 08 Aug 2013 06:33:42 +0200, Stefan Behnel <stefan_ml@behnel.de> a écrit :
[from python-ideas]
Antoine Pitrou, 07.08.2013 08:04:
Take a look at IncrementalParser: http://docs.python.org/dev/library/xml.etree.elementtree.html#incremental-pa...
Hmm, that seems to be a somewhat recent addition (April 2013). I would have preferred hearing about it before it got added.
I don't like the fact that it adds a second interface to iterparse() that allows injecting arbitrary content into the parser. You can now run iterparse() to read from a file, and at an arbitrary iteration position, send it a byte string to parse from, before it goes reading more data from the file. Or take out some events before iteration continues.
I think the implementation should be changed to make iterparse() return something that wraps an IncrementalParser, not something that is an IncrementalParser.
That sounds reasonable. Do you want to post a patch? :-)
Also, IMO it should mimic the interface of the TreeBuilder, which calls the data reception method "data()" and the termination method "close()". There is no reason to add yet another set of methods names just to do what others do already.
Well, the difference here is that after calling eof_received() you can still (and should) call events() once to get the last events. I think it would be weird if you could still do something useful with the object after calling close(). Also, the method names are not invented, they mimick the PEP 3156 stream protocols: http://www.python.org/dev/peps/pep-3156/#stream-protocols Regards Antoine.
Antoine Pitrou, 08.08.2013 10:20:
Le Thu, 08 Aug 2013 06:33:42 +0200, Stefan Behnel a écrit :
Antoine Pitrou, 07.08.2013 08:04:
http://docs.python.org/dev/library/xml.etree.elementtree.html#incremental-pa...
I don't like the fact that it adds a second interface to iterparse() that allows injecting arbitrary content into the parser. You can now run iterparse() to read from a file, and at an arbitrary iteration position, send it a byte string to parse from, before it goes reading more data from the file. Or take out some events before iteration continues.
I think the implementation should be changed to make iterparse() return something that wraps an IncrementalParser, not something that is an IncrementalParser.
That sounds reasonable. Do you want to post a patch? :-)
I attached it to the ticket that seems to have been the source of this addition. http://bugs.python.org/issue17741 Please note that the tulip mailing list is not an appropriate place to discuss additions to the XML libraries, and ElementTree in particular. Is there a way to get automatic notification when the XML component is assigned to a ticket? (Not that it would have helped in this case, as the component was missing from the ticket.)
Also, IMO it should mimic the interface of the TreeBuilder, which calls the data reception method "data()"
Uups, sorry. It's actually called feed().
and the termination method "close()". There is no reason to add yet another set of methods names just to do what others do already.
Well, the difference here is that after calling eof_received() you can still (and should) call events() once to get the last events. I think it would be weird if you could still do something useful with the object after calling close().
Also, the method names are not invented, they mimick the PEP 3156 stream protocols: http://www.python.org/dev/peps/pep-3156/#stream-protocols
I see your point about close(). I assume your reasoning was to make the IncrementalParser an arbitrary stream end-point. However, it doesn't really make all that much sense to connect an arbitrary data source to it, as the source wouldn't know that, in addition to passing in data, it would also have to ask for events from time to time. I mean, you could do it, but then it would just fill up the memory with parser events and loose the actual advantages of incremental parsing. So, in a way, the whole point of the class is to *not* be an arbitrary stream end-point. Anyway, given that there isn't really the One Obvious Way to do it, maybe you should just add a docstring to the class (ahem), reference the stream protocol as the base for its API, and then rename it to IncrementalStreamParser. That would at least make it clear why it doesn't really fit with the rest of the module API (which was designed some decade before PEP 3156) and instead uses its own naming scheme. Stefan
Le Fri, 09 Aug 2013 13:11:11 +0200, Stefan Behnel <stefan_ml@behnel.de> a écrit :
I attached it to the ticket that seems to have been the source of this addition.
http://bugs.python.org/issue17741
Please note that the tulip mailing list is not an appropriate place to discuss additions to the XML libraries, and ElementTree in particular.
Well, the bug tracker is the main point of discussion, except that few people bothered discussing it.
Is there a way to get automatic notification when the XML component is assigned to a ticket? (Not that it would have helped in this case, as the component was missing from the ticket.)
You could ask to get included in the "experts" index: http://docs.python.org/devguide/experts.html (I doubt anyone would object to that)
Anyway, given that there isn't really the One Obvious Way to do it, maybe you should just add a docstring to the class (ahem), reference the stream protocol as the base for its API, and then rename it to IncrementalStreamParser.
I don't think there's any point in making the class name longer. Parsing XML incrementally is pretty much what it does. As for the docstring, uh, well, sure :-) (IMHO, IncrementalParser is the One Obvious Way to do incremental XML parsing in 3.4, but YMMV) Regards Antoine.
Antoine Pitrou, 09.08.2013 14:50:
Le Fri, 09 Aug 2013 13:11:11 +0200, Stefan Behnel a écrit :
I attached it to the ticket that seems to have been the source of this addition.
http://bugs.python.org/issue17741
Please note that the tulip mailing list is not an appropriate place to discuss additions to the XML libraries, and ElementTree in particular.
Well, the bug tracker is the main point of discussion, except that few people bothered discussing it.
The bug tracker is usually not a very visible place to start discussing about changes. This change is a particularly good example, I've certainly seen others.
Is there a way to get automatic notification when the XML component is assigned to a ticket? (Not that it would have helped in this case, as the component was missing from the ticket.)
You could ask to get included in the "experts" index: http://docs.python.org/devguide/experts.html (I doubt anyone would object to that)
Ok, please add me for xml.etree then. I used to get added to the noisy list for ET tickets during the 3.3 release cycle, but that seems to have stopped a while back. Since it's easier to erase my name from the noisy list than to add myself to a bug I've never heard about, I'm ok with being added for anything that relates to ET, basically, be it bug or feature.
Anyway, given that there isn't really the One Obvious Way to do it, maybe you should just add a docstring to the class (ahem), reference the stream protocol as the base for its API, and then rename it to IncrementalStreamParser.
I don't think there's any point in making the class name longer.
Agreed. It's not the class name that should be modified but the method names. I changed my mind and posted to the tracker. I also attached a new patch that changes the implementation to what I think it should look like. Stefan
participants (2)
-
Antoine Pitrou
-
Stefan Behnel