this string, what encoding?

Anonymous User nospam at home.com
Fri Aug 10 14:39:12 EDT 2001


Hi, is there a way to find out what encoding a string is in?  (This
question--heck, this entire post--may not make any sense, in which case it
merely reveals my ignorance.)

Here's why I'm driven to ask this question:

 import xml.dom.minidom
 foo = '<foo/>'
 fooDoc = xml.dom.minidom.parseString(foo)
 fooXml = fooDoc.toxml()
 try:
     fooDoc2 = xml.dom.minidom.parseString(fooXml)
 except TypeError:
     print 'Round-tripping failed.'

The reason this fails is that cStringIO, which is used by
xml.dom.pulldom.parseString, rejects unicode strings, as noted here:


http://sourceforge.net/tracker/index.php?func=detail&aid=216388&group_id=547
0&atid=105470

So, I figured, I'd workaround this by adding the following modification to
pulldom.py:

def parseString(string, parser=None):
    try:
        from cStringIO import StringIO
        # <fixme type="workaround">
        # this is a temporary workaround since cStringIO doesn't accept
unicode input as noted here:
        #
http://sourceforge.net/tracker/index.php?func=detail&aid=216388&group_id=547
0&atid=105470
        string = string.encode('utf-8')
        # </fixme>
    except ImportError:
        from StringIO import StringIO

Rather than just always encoding the string, I'd like to be able to do
something like this (pseudocode):

    # if the string is not ASCII, make it so
    if string.encoding != 'utf-8':
        string = string.encode('utf-8')

Thanks,

// mark






More information about the Python-list mailing list