[Python-bugs-list] PRIVATE: xmllib.XMLParser.handle_data() seems to handle ']' incorrectly (PR#63)

fdrake@acm.org fdrake@acm.org
Wed, 25 Aug 1999 11:01:49 -0400 (EDT)


mkes@ra.rockwell.com writes:
 > Full_Name: Miroslav Kes
 > Version: 1.5.2
 > OS: FreeBSD 3.2
 > Submission from: (NULL) (205.175.223.11)
...
 > I have experienced following strange behaviour of
 > xmllib.XMLParser.handle_data()
 > method.
 > If I have XML tag whose body contains ']' the handle_data() method considers 
 > the ']' as separator (or what ?) and splits the whole text into pieces:


  While this may be confusing, this is not a bug.  The API does
guarantees that handle_data() will be called for all textual data, but 
not that each call will represent a maximal run of data.  This is
something that can also happen at arbitrary points in the input stream 
if feeding the parser object chunks of the input, which is commonly
done if the input is large or comes from a network connection.
  To get the text content of an element, do something like this:

class MyXMLParser(xmllib.XMLParser):
    __saving = 0

    def start_ELEMENT(self, attrs):
        self.__saving = 1
        self.__saved_text = ''

    def end_ELEMENT(self):
        text = self.__saved_text
        self.__saving = 0
        # do something with text

    def handle_data(self, data):
        if self.__saving:
            self.__saved_text = self.__saved_text + data

  You may want to look at the implementations of save_bgn() and
save_end() in htmllib.HTMLParser; similar utility methods may prove
convenient and can be used to avoid a little ugliness and offer the
facility to subclasses of your new class.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives