[XML-SIG] Re: Re: SAX encoding and special characters

Thomas thomasj at eworld.hu
Sun Apr 18 05:15:14 EDT 2004


Thanks Fredrik!

Both suggested solutions work like a charm.

Thank you for help!

Regards,
        Thomas

Sunday, April 18, 2004, 10:14:05 AM, Fredrik wrote:

FL> "Thomas" wrote:

>> FL> </F>
>> Yes, this was the 1st thing I tryed out. Unfortunately I got:
>> Traceback (most recent call last):
>>   File "./xmlparser_new.py", line 210, in ?
>>     saxparser.parseString(document)
>> AttributeError: ExpatParser instance has no attribute 'parseString'
>>
>> Do you have an idea how to fix it? (yes, I underestand that it's not
>> supported by expat - unfortunately I don't have experience with it).

FL> iirc, parse takes either a file name or a file object, so the following
FL> might work:

FL>     import StringIO
FL>     ...
FL>     saxparser.parse(StringIO.StringIO(document))

FL> Python's SAX implementation also supports incremental parsing; I think
FL> you should be able to simply do:

FL>     saxparser.feed(document)
FL>     saxparser.close()

FL> :::

FL> and yes, since you have to read the entire document into a string, you can
FL> extract the encoding from that string.  here's a fairly robust RE that
FL> should
FL> do the trick:

FL>     m = re.match(r"<\?xml[^>]+encoding=['\"]([-\w]+)['\"]", data)
FL>     if m:
FL>         encoding = m.group(1)

FL> (a much better approach is to stick to a standard encoding in the output
FL> files,
FL> no matter what encoding the XML files use.  XML is unicode, and the XML
FL> encoding shouldn't matter).

FL> </F>





More information about the XML-SIG mailing list