Re: [Python-Dev] [Python-checkins] cpython: whatsnew: XMLPullParser, plus some doc updates.

On 5 Jan 2014 12:54, "r.david.murray" <python-checkins@python.org> wrote:
http://hg.python.org/cpython/rev/069f88f4935f changeset: 88308:069f88f4935f user: R David Murray <rdmurray@bitdance.com> date: Sat Jan 04 23:52:50 2014 -0500 summary: whatsnew: XMLPullParser, plus some doc updates.
I was confused by the text saying that read_events "iterated", since it actually returns an iterator (that's what a generator does) that the caller must then iterate. So I tidied up the language. I'm not sure what the sentence "Events provided in a previous call to read_events() will not be yielded again." is trying to convey, so I didn't try to fix
that. It's a mutating API - once the events have been retrieved, that's it, they're gone from the internal state. Suggestions for wording improvements welcome :) Cheers, Nick.
Also fixed a couple more news items.
files: Doc/library/xml.etree.elementtree.rst | 23 +++++++++----- Doc/whatsnew/3.4.rst | 7 ++- Lib/xml/etree/ElementTree.py | 2 +- Misc/NEWS | 12 +++--- 4 files changed, 25 insertions(+), 19 deletions(-)
diff --git a/Doc/library/xml.etree.elementtree.rst
--- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -105,12 +105,15 @@ >>> root[0][1].text '2008'
+ +.. _elementtree-pull-parsing: + Pull API for non-blocking parsing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Most parsing functions provided by this module require to read the whole -document at once before returning any result. It is possible to use a -:class:`XMLParser` and feed data into it incrementally, but it's a push API that +Most parsing functions provided by this module require the whole document +to be read at once before returning any result. It is possible to use an +:class:`XMLParser` and feed data into it incrementally, but it is a push API that calls methods on a callback target, which is too low-level and inconvenient for most needs. Sometimes what the user really wants is to be able to parse XML incrementally, without blocking operations, while enjoying the convenience of @@ -119,7 +122,7 @@ The most powerful tool for doing this is :class:`XMLPullParser`. It does not require a blocking read to obtain the XML data, and is instead fed with data incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML -elements, call :meth:`XMLPullParser.read_events`. Here's an example:: +elements, call :meth:`XMLPullParser.read_events`. Here is an example::
>>> parser = ET.XMLPullParser(['start', 'end']) >>> parser.feed('<mytag>sometext') @@ -1038,15 +1041,17 @@
.. method:: read_events()
- Iterate over the events which have been encountered in the data fed to the - parser. This method yields ``(event, elem)`` pairs, where *event* is a + Return an iterator over the events which have been encountered in
+ data fed to the + parser. The iterator yields ``(event, elem)`` pairs, where *event* is a string representing the type of event (e.g. ``"end"``) and *elem* is the encountered :class:`Element` object.
Events provided in a previous call to :meth:`read_events` will not be - yielded again. As events are consumed from the internal queue only as - they are retrieved from the iterator, multiple readers calling - :meth:`read_events` in parallel will have unpredictable results. + yielded again. Events are consumed from the internal queue only when + they are retrieved from the iterator, so multiple readers iterating in + parallel over iterators obtained from :meth:`read_events` will have + unpredictable results.
.. note::
diff --git a/Doc/whatsnew/3.4.rst b/Doc/whatsnew/3.4.rst --- a/Doc/whatsnew/3.4.rst +++ b/Doc/whatsnew/3.4.rst @@ -1088,9 +1088,10 @@ xml.etree ---------
-Add an event-driven parser for non-blocking applications, -:class:`~xml.etree.ElementTree.XMLPullParser`. -(Contributed by Antoine Pitrou in :issue:`17741`.) +A new parser, :class:`~xml.etree.ElementTree.XMLPullParser`, allows a +non-blocking applications to parse XML documents. An example can be +seen at :ref:`elementtree-pull-parsing`. (Contributed by Antoine +Pitrou in :issue:`17741`.)
The :mod:`xml.etree.ElementTree` :func:`~xml.etree.ElementTree.tostring` and :func:`~xml.etree.ElementTree.tostringlist` functions, and the diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1251,7 +1251,7 @@ self._close_and_return_root()
def read_events(self): - """Iterate over currently available (event, elem) pairs. + """Return an iterator over currently available (event, elem)
b/Doc/library/xml.etree.elementtree.rst the pairs.
Events are consumed from the internal event queue as they are retrieved from the iterator. diff --git a/Misc/NEWS b/Misc/NEWS --- a/Misc/NEWS +++ b/Misc/NEWS @@ -2193,14 +2193,14 @@ - Issue #17555: Fix ForkAwareThreadLock so that size of after fork registry does not grow exponentially with generation of process.
-- Issue #17707: multiprocessing.Queue's get() method does not block for
short
- timeouts. - -- Isuse #17720: Fix the Python implementation of pickle.Unpickler to correctly +- Issue #17707: fix regression in multiprocessing.Queue's get() method where + it did not block for short timeouts. + +- Issue #17720: Fix the Python implementation of pickle.Unpickler to correctly process the APPENDS opcode when it is used on non-list objects.
-- Issue #17012: shutil.which() no longer fallbacks to the PATH environment - variable if empty path argument is specified. Patch by Serhiy Storchaka. +- Issue #17012: shutil.which() no longer falls back to the PATH environment + variable if an empty path argument is specified. Patch by Serhiy Storchaka.
- Issue #17710: Fix pickle raising a SystemError on bogus input.
-- Repository URL: http://hg.python.org/cpython
_______________________________________________ Python-checkins mailing list Python-checkins@python.org https://mail.python.org/mailman/listinfo/python-checkins

On Tue, 07 Jan 2014 01:22:21 +1000, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 5 Jan 2014 12:54, "r.david.murray" <python-checkins@python.org> wrote:
http://hg.python.org/cpython/rev/069f88f4935f changeset: 88308:069f88f4935f user: R David Murray <rdmurray@bitdance.com> date: Sat Jan 04 23:52:50 2014 -0500 summary: whatsnew: XMLPullParser, plus some doc updates.
I was confused by the text saying that read_events "iterated", since it actually returns an iterator (that's what a generator does) that the caller must then iterate. So I tidied up the language. I'm not sure what the sentence "Events provided in a previous call to read_events() will not be yielded again." is trying to convey, so I didn't try to fix
that.
It's a mutating API - once the events have been retrieved, that's it, they're gone from the internal state. Suggestions for wording improvements welcome :)
Well, my guess as to what it meant was roughly: "An Event will be yielded exactly once regardless of how many read_events iterators are processed." Looking at the code, though, I'm not sure that's actually true. The code does not appear to be thread-safe. Of course, it isn't intended to be used in a threaded context, but the docs don't quite make that explicit. I imagine that's the intent of the statement about "parallel" reading, but it doesn't actually say that the code is not thread safe. It reads more as if it is warning that the order of retrieval would be unpredictable. --David
participants (2)
-
Nick Coghlan
-
R. David Murray