[XML-SIG] XML Unicode and UTF-8

Uche Ogbuji uche.ogbuji at fourthought.com
Tue Aug 10 03:11:11 CEST 2004


It looks as if I should have read the whole thread before posting. 
Martin's been a great help, but I still have a couple of observations.

On Thu, 2004-08-05 at 06:22, n.youngman at ntlworld.com wrote:
> OK. I read the opaque documentation^W^W fine manual for a while, then googled for a while, and finally decided to just hack about with what I had.

I personally think the Python/Unicode docs are pretty good, but Unicode
is *hard*.  No getting around that.


> I now have
> 
>     charset_tag.appendChild( doc.createTextNode( segment[1] ) )
>     unicode = segment[0].decode( segment[1] ).encode( "utf-8")
>     unicode_tag = doc.createElement( 'unicode' )
>     unicode_tag.appendChild( doc.createTextNode( unicode ) )


I wouldn't use "unicode" as a variable name if I were you, since it's a
built-in in Python 2.2 and up.

I suggest

    unicode_tag = doc.createElement( u'unicode' )

rather than

    unicode_tag = doc.createElement( 'unicode' )

Remember that XML element and attribute names are also (a subset of)
Unicode, even though they're a smaller subset than that of character
data.


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Decomposition, Process, Recomposition - http://www.xml.com/pub/a/2004/07/28/py-xml.html
Perspective on XML: Steady steps spell success with Google - http://www.adtmag.com/article.asp?id=9663
Managing XML libraries - http://www.adtmag.com/article.asp?id=9160
Commentary on "Objects. Encapsulation. XML?" - http://www.adtmag.com/article.asp?id=9090
Harold's Effective XML - http://www.ibm.com/developerworks/xml/library/x-think25.html
A survey of XML standards - http://www-106.ibm.com/developerworks/xml/library/x-stand4/



More information about the XML-SIG mailing list