[XML-SIG] Re: Re: SAX encoding and special characters
Thomas
thomasj at eworld.hu
Sun Apr 18 05:15:14 EDT 2004
Thanks Fredrik!
Both suggested solutions work like a charm.
Thank you for help!
Regards,
Thomas
Sunday, April 18, 2004, 10:14:05 AM, Fredrik wrote:
FL> "Thomas" wrote:
>> FL> </F>
>> Yes, this was the 1st thing I tryed out. Unfortunately I got:
>> Traceback (most recent call last):
>> File "./xmlparser_new.py", line 210, in ?
>> saxparser.parseString(document)
>> AttributeError: ExpatParser instance has no attribute 'parseString'
>>
>> Do you have an idea how to fix it? (yes, I underestand that it's not
>> supported by expat - unfortunately I don't have experience with it).
FL> iirc, parse takes either a file name or a file object, so the following
FL> might work:
FL> import StringIO
FL> ...
FL> saxparser.parse(StringIO.StringIO(document))
FL> Python's SAX implementation also supports incremental parsing; I think
FL> you should be able to simply do:
FL> saxparser.feed(document)
FL> saxparser.close()
FL> :::
FL> and yes, since you have to read the entire document into a string, you can
FL> extract the encoding from that string. here's a fairly robust RE that
FL> should
FL> do the trick:
FL> m = re.match(r"<\?xml[^>]+encoding=['\"]([-\w]+)['\"]", data)
FL> if m:
FL> encoding = m.group(1)
FL> (a much better approach is to stick to a standard encoding in the output
FL> files,
FL> no matter what encoding the XML files use. XML is unicode, and the XML
FL> encoding shouldn't matter).
FL> </F>
More information about the XML-SIG
mailing list