[Python-Dev] xml.etree.ElementTree.IncrementalParser

Stefan Behnel stefan_ml at behnel.de
Fri Aug 9 13:11:11 CEST 2013


Antoine Pitrou, 08.08.2013 10:20:
> Le Thu, 08 Aug 2013 06:33:42 +0200,
> Stefan Behnel a écrit :
>> Antoine Pitrou, 07.08.2013 08:04:
>>> http://docs.python.org/dev/library/xml.etree.elementtree.html#incremental-parsing
>>
>> I don't like the fact that it adds a second interface to iterparse()
>> that allows injecting arbitrary content into the parser.
>> You can now
>> run iterparse() to read from a file, and at an arbitrary iteration
>> position, send it a byte string to parse from, before it goes reading
>> more data from the file. Or take out some events before iteration
>> continues.
>>
>> I think the implementation should be changed to make iterparse()
>> return something that wraps an IncrementalParser, not something that
>> is an IncrementalParser.
> 
> That sounds reasonable. Do you want to post a patch? :-)

I attached it to the ticket that seems to have been the source of this
addition.

http://bugs.python.org/issue17741

Please note that the tulip mailing list is not an appropriate place to
discuss additions to the XML libraries, and ElementTree in particular.

Is there a way to get automatic notification when the XML component is
assigned to a ticket? (Not that it would have helped in this case, as the
component was missing from the ticket.)


>> Also, IMO it should mimic the interface of the TreeBuilder, which
>> calls the data reception method "data()"

Uups, sorry. It's actually called feed().

>> and the termination method
>> "close()". There is no reason to add yet another set of methods names
>> just to do what others do already.
> 
> Well, the difference here is that after calling eof_received() you can
> still (and should) call events() once to get the last events. I think
> it would be weird if you could still do something useful with the object
> after calling close().
> 
> Also, the method names are not invented, they mimick the PEP 3156
> stream protocols:
> http://www.python.org/dev/peps/pep-3156/#stream-protocols

I see your point about close(). I assume your reasoning was to make the
IncrementalParser an arbitrary stream end-point. However, it doesn't really
make all that much sense to connect an arbitrary data source to it, as the
source wouldn't know that, in addition to passing in data, it would also
have to ask for events from time to time. I mean, you could do it, but then
it would just fill up the memory with parser events and loose the actual
advantages of incremental parsing. So, in a way, the whole point of the
class is to *not* be an arbitrary stream end-point.

Anyway, given that there isn't really the One Obvious Way to do it, maybe
you should just add a docstring to the class (ahem), reference the stream
protocol as the base for its API, and then rename it to
IncrementalStreamParser. That would at least make it clear why it doesn't
really fit with the rest of the module API (which was designed some decade
before PEP 3156) and instead uses its own naming scheme.

Stefan




More information about the Python-Dev mailing list