[XML-SIG] SAX characters() output on multiple lines for non-ascii
Fred Drake
fdrake at acm.org
Sun Feb 3 04:03:20 CET 2008
On Feb 2, 2008, at 6:04 PM, woodcock wrote:
> I am starting with SAX and am trying to parse a file that contains
> non-ascii
> characters. The xml file uses 'ISO-8859-1'. When it parses text
> containing
> non-ascii characters the output is across multiple lines.
This is a fundamental issue with the SAX interface (the interface
doesn't mandate the splits, but states that they're allowed). If you
want something that buffers the text and provides it in larger chunks,
that could be written as a proxy content handler.
It might be nice if one were provided out of the box, since this is a
common request, but the basic issue is that some seriously huge
amounts of data may be enclosed between non-text calls, and one of the
advantages of SAX is that it doesn't require loading large portions of
the document into memory if the application doesn't require it.
-Fred
--
Fred Drake <fdrake at acm.org>
More information about the XML-SIG
mailing list