[XML-SIG] Re: Re: SAX encoding and special characters
fredrik at pythonware.com
Sun Apr 18 04:14:05 EDT 2004
> FL> </F>
> Yes, this was the 1st thing I tryed out. Unfortunately I got:
> Traceback (most recent call last):
> File "./xmlparser_new.py", line 210, in ?
> AttributeError: ExpatParser instance has no attribute 'parseString'
> Do you have an idea how to fix it? (yes, I underestand that it's not
> supported by expat - unfortunately I don't have experience with it).
iirc, parse takes either a file name or a file object, so the following
Python's SAX implementation also supports incremental parsing; I think
you should be able to simply do:
and yes, since you have to read the entire document into a string, you can
extract the encoding from that string. here's a fairly robust RE that
do the trick:
m = re.match(r"<\?xml[^>]+encoding=['\"]([-\w]+)['\"]", data)
encoding = m.group(1)
(a much better approach is to stick to a standard encoding in the output
no matter what encoding the XML files use. XML is unicode, and the XML
encoding shouldn't matter).
More information about the XML-SIG