[XML-SIG] [ pyxml-Bugs-438397 ] truncated content passed to characters()

noreply@sourceforge.net noreply@sourceforge.net
Tue, 03 Jul 2001 17:12:02 -0700


Bugs item #438397, was opened at 2001-07-03 17:12
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=438397&group_id=6473

Category: SAX
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mr. Codepage (codepage)
Assigned to: Nobody/Anonymous (nobody)
Summary: truncated content passed to characters()

Initial Comment:
Parsing a pretty simple 500k xml file.

The bad output lines in question look like

c <--- truncated, should be com.xxxxxx.ejb.domain.intfc
com/xxxxxx/ejb/domain/intfc/AdverseReactionType.java
com.xxxxxx.e <--- truncated
com/xxxxxx/ejb/service/hsif/msgHandler/intfc/HLSevenHan
dler.java

This is an xml file that describes the source pool at 
a certain release point in time.

I rewrote the small script in java with Xereces and it 
is fine.

The XML file does NOT contain truncated data. If I 
extract the portions of the datafile above that are 
having problems and put it in its own xml file, it 
works fine (with the code below). It is only this 
configuration of the datafile that is truncating the 
value of <b>content</b> passed to characters(). The 
XML file is well formed.

class packageScan(saxutils.DefaultHandler):
	def __init__(self):
		self.showText = 0
		self.grabPath = 0
		self.Path = ""
	def startElement(self, name, attrs):
		if name == "package":
			self.showText = 1
		elif name == "path":
			self.grabPath = 1
	def characters(self, content):
		if self.showText == 1:
			if len(content) < 13:
				print content
				print self.Path
			self.showText = 0
		if self.grabPath == 1:
			self.Path = content
			self.grabPath = 0

python 2.1
pyxml 0.6.5

I would be happy to test any workarounds, patches, etc.



----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=438397&group_id=6473