[Python-bugs-list] PRIVATE: xmllib.XMLParser.handle_data() seems to handle ']' incorrectly (PR#63)

mkes@ra.rockwell.com mkes@ra.rockwell.com
Wed, 25 Aug 1999 08:37:50 -0400 (EDT)


Full_Name: Miroslav Kes
Version: 1.5.2
OS: FreeBSD 3.2
Submission from: (NULL) (205.175.223.11)


Hi!

I have experienced following strange behaviour of
xmllib.XMLParser.handle_data()
method.
If I have XML tag whose body contains ']' the handle_data() method considers 
the ']' as separator (or what ?) and splits the whole text into pieces:

the source code:
-------------------
import xmllib
import sys

class MyXMLParser( xmllib.XMLParser ):
    
    i = 0

    def __init__( self ):
        xmllib.XMLParser.__init__( self )
        self.i = 0

    def unknown_starttag( self, tag, attributes ):
        print 'start tag: ' + tag + str( attributes )

    def unknown_endtag( self, tag ):
        print 'end tag: ' + tag

    def handle_data( self, data ):
        print self.i
        print type( data )
        print data
        self.i = self.i + 1

    def run( self, filename ):
        self.__init__()
        file = open( filename, 'r' )
        self.feed( file.read())
        self.close()
        file.close()

----------------------------------
the XML file sample:
----------------------------------
<?xml version="1.0"?>
<TEST>
  <NAME>
    Conversion from web speed [fpm] to motor speed [rpm] wrong (reference
calibration)
  </NAME>
</TEST>

---------------------------------
runtime:
---------------------------------
odysseus:/usr/home/mira/engine> python
Python 1.5.2 (#2, May 11 1999, 17:14:37)  [GCC 2.7.2.1] on freebsd3
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import xmlparser
>>> p = xmlparser.MyXMLParser()
>>> p.run("test.xml")
0
<type 'string'>


start tag: TEST{}
1
<type 'string'>


start tag: NAME{}
2
<type 'string'>
Conversion from web speed [fpm
3
<type 'string'>
]
4
<type 'string'>
 to motor speed [rpm
5
<type 'string'>
]
6
<type 'string'>
 wrong (reference calibration)
end tag: NAME
7
<type 'string'>


end tag: TEST
8
<type 'string'>


>>>
----------------------------------------

I think this is a bug because the XML specification treats the ']' 
as valid data character.
W3C spec. -  [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) 

If I parse the example above with other XML parsers (MSXML and one Java based
parser) 
and they read it OK.


Mira