codecs, Swedish characters, and XML...don't mix? (repost)
Andrew Kuchling
akuchlin at mems-exchange.org
Fri May 11 16:00:40 EDT 2001
Michael Hammill <mike at pdc.kth.se> writes:
> bad_xml_pi = u'<?xml version="1.0" ?>'
> good_xml_pi = u'<?xml version="1.0" encoding="UTF-8" ?>'
> good_doctype = u'<!DOCTYPE ...... I'll spare you ...>'
> (new_result, n) = re.subn(bad_xml_pi, good_xml_pi + good_doctype, file_string)
? is a special character in regular expressions. You should either use
file_string.replace(bad_xml_pi, good_doctype + good_doctype),
or run bad_xml_pi through re.escape() before passing it to re.subn.
>>> re.escape
<function escape at 0x8136e14>
>>> re.escape('<?xml version="1.0" ?>')
'\\<\\?xml\\ version\\=\\"1\\.0\\"\\ \\?\\>'
Arguably the minidom .toxml() method should provide a way to select an
encoding, though.
--amk
More information about the Python-list
mailing list