[XML-SIG] [ pyxml-Bugs-438397 ] truncated content passed to characters()
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 03 Jul 2001 17:12:02 -0700
Bugs item #438397, was opened at 2001-07-03 17:12
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=438397&group_id=6473
Category: SAX
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mr. Codepage (codepage)
Assigned to: Nobody/Anonymous (nobody)
Summary: truncated content passed to characters()
Initial Comment:
Parsing a pretty simple 500k xml file.
The bad output lines in question look like
c <--- truncated, should be com.xxxxxx.ejb.domain.intfc
com/xxxxxx/ejb/domain/intfc/AdverseReactionType.java
com.xxxxxx.e <--- truncated
com/xxxxxx/ejb/service/hsif/msgHandler/intfc/HLSevenHan
dler.java
This is an xml file that describes the source pool at
a certain release point in time.
I rewrote the small script in java with Xereces and it
is fine.
The XML file does NOT contain truncated data. If I
extract the portions of the datafile above that are
having problems and put it in its own xml file, it
works fine (with the code below). It is only this
configuration of the datafile that is truncating the
value of <b>content</b> passed to characters(). The
XML file is well formed.
class packageScan(saxutils.DefaultHandler):
def __init__(self):
self.showText = 0
self.grabPath = 0
self.Path = ""
def startElement(self, name, attrs):
if name == "package":
self.showText = 1
elif name == "path":
self.grabPath = 1
def characters(self, content):
if self.showText == 1:
if len(content) < 13:
print content
print self.Path
self.showText = 0
if self.grabPath == 1:
self.Path = content
self.grabPath = 0
python 2.1
pyxml 0.6.5
I would be happy to test any workarounds, patches, etc.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=438397&group_id=6473