[XML-SIG] Re: Re: SAX encoding and special characters

Fredrik Lundh fredrik at pythonware.com
Sun Apr 18 04:14:05 EDT 2004

"Thomas" wrote:

> FL> </F>
> Yes, this was the 1st thing I tryed out. Unfortunately I got:
> Traceback (most recent call last):
>   File "./xmlparser_new.py", line 210, in ?
>     saxparser.parseString(document)
> AttributeError: ExpatParser instance has no attribute 'parseString'
> Do you have an idea how to fix it? (yes, I underestand that it's not
> supported by expat - unfortunately I don't have experience with it).

iirc, parse takes either a file name or a file object, so the following
might work:

    import StringIO

Python's SAX implementation also supports incremental parsing; I think
you should be able to simply do:



and yes, since you have to read the entire document into a string, you can
extract the encoding from that string.  here's a fairly robust RE that
do the trick:

    m = re.match(r"<\?xml[^>]+encoding=['\"]([-\w]+)['\"]", data)
    if m:
        encoding = m.group(1)

(a much better approach is to stick to a standard encoding in the output
no matter what encoding the XML files use.  XML is unicode, and the XML
encoding shouldn't matter).


More information about the XML-SIG mailing list