[XML-SIG] [ pyxml-Bugs-1165107 ] sgmlop drops trailing partial
tokens
SourceForge.net
noreply at sourceforge.net
Thu Mar 17 11:25:12 CET 2005
Bugs item #1165107, was opened at 2005-03-17 11:25
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=106473&aid=1165107&group_id=6473
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Magnus Lie Hetland (mlh)
Assigned to: Nobody/Anonymous (nobody)
Summary: sgmlop drops trailing partial tokens
Initial Comment:
Partial entities in the middle of the text are
(appropriately) reported as text by sgmlop. However, if
the partial entity is placed at the end of the text, it
isn't reported. This behavior would be understandble
when using the feed method alone, but it also occurs
with the parse method (which closes the parser after
the feed), and that is unfortunate. It means (as far as
I can see) that the tail of the input is simply
ignored. One especially bad example is if the input
contains -- or even begins with -- a stray '<'
character, without later containing a '>' character.
Then everything from that point on is ignored.
The following snippet demonstrates the problem:
from xml.parsers.sgmlop import SGMLParser, XMLParser,
XMLUnicodeParser
class Handler:
def handle_data(self, data):
print 'Data:', repr(data)
for text in ['<', '{', '<foo bar < " ', '</foo',
'< ', '{ ', 'frozz <foo bar < " ',
'bar </foo']:
for parser in [SGMLParser(), XMLParser(),
XMLUnicodeParser()]:
parser.register(Handler())
print '%s with %s:' % (repr(parser), repr(text))
parser.parse(text)
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=106473&aid=1165107&group_id=6473
More information about the XML-SIG
mailing list