[XML-SIG] unicode
Mark McEahern
marklists@mceahern.com
Fri, 10 Aug 2001 10:23:55 -0700
>If you have the need to parse Unicode strings, I'd recommend to encode
>them first. If you have an encoding declaration in the document, you
>should encode them using that declaration; otherwise you should encode
>them as UTF-8.
Yes, that's the wordaround I've been using:
foo = u'<foo/>'
foo = foo.encode('utf-8')
fooDoc = parseString(foo)
By the way, I discovered a clearer demonstration of the problem:
import xml.dom.minidom
foo = '<foo/>'
fooDoc = xml.dom.minidom.parseString(foo)
fooXml = fooDoc.toxml()
try:
fooDoc2 = xml.dom.minidom.parseString(fooXml)
except TypeError:
print 'Round-tripping failed.'
>If you can come up with a patch that gets this right, it would be much
>appreciated.
Well, I noticed the error was happening in pulldom.py:
File "c:\python21\_xmlplus\dom\pulldom.py", line 316, in parseString
buf = StringIO(string)
TypeError: expected string, unicode found
parseString looks like this:
def parseString(string, parser=None):
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
bufsize = len(string)
buf = StringIO(string)
if not parser:
parser = xml.sax.make_parser()
return DOMEventStream(buf, parser, bufsize)
I thought, "Gee, what if I force this to use StringIO instead of
cStringIO..."
Guess what? It worked without manually encoding the string as 'utf-8'.
Does that mean the problem is in cStringIO? I guess I'm still not clear on
what the exact problem is. Is it that cStringIO only accepts ... what?
Thanks,
// mark