[XML-SIG] Re: Re: SAX encoding and special characters
Fredrik Lundh
fredrik at pythonware.com
Sun Apr 18 04:14:05 EDT 2004
"Thomas" wrote:
> FL> </F>
> Yes, this was the 1st thing I tryed out. Unfortunately I got:
> Traceback (most recent call last):
> File "./xmlparser_new.py", line 210, in ?
> saxparser.parseString(document)
> AttributeError: ExpatParser instance has no attribute 'parseString'
>
> Do you have an idea how to fix it? (yes, I underestand that it's not
> supported by expat - unfortunately I don't have experience with it).
iirc, parse takes either a file name or a file object, so the following
might work:
import StringIO
...
saxparser.parse(StringIO.StringIO(document))
Python's SAX implementation also supports incremental parsing; I think
you should be able to simply do:
saxparser.feed(document)
saxparser.close()
:::
and yes, since you have to read the entire document into a string, you can
extract the encoding from that string. here's a fairly robust RE that
should
do the trick:
m = re.match(r"<\?xml[^>]+encoding=['\"]([-\w]+)['\"]", data)
if m:
encoding = m.group(1)
(a much better approach is to stick to a standard encoding in the output
files,
no matter what encoding the XML files use. XML is unicode, and the XML
encoding shouldn't matter).
</F>
More information about the XML-SIG
mailing list